Closed KeelyDulmage closed 9 months ago
Hi @KeelyDulmage
Thanks for raining this issue. Yes it looks like the datatypes need to be enforced when reading the CSV. Sorry for the inconvenience. I'll get a fix out for this as soo as possible.
Hello @nrhorner Can you please help us fix this issue quickly and tell us an intermittent solution we can do to avoid this issue? with the datatypes that need to be enforced when reading the CSV. I am also getting similar errors as follows:
`` ERROR ~ Error executing process > 'pipeline:aav_structures (1)'
Caused by:
Process pipeline:aav_structures (1)
terminated with an error exit status (1)
Command executed:
export POLARS_MAX_THREADS=4 workflow-glue aav_structures --bam_info bam_info.tsv --itr_locations 1375 1519 5937 6081 --output_plot_data 'aav_structure_counts.tsv' --output_per_read 'fastq_files_aav_per_read_info.tsv' --sample_id "fastq_files" --transgene_plasmid_name "HA_MYBPC3" --itr_fl_threshold 100 --itr_backbone_threshold 20 --symmetry_threshold 10
Command exit status: 1
Command output: (empty)
Command error:
[15:05:39 - matplotlib.font_manager] generated new fontManager
[15:05:39 - workflow_glue] Starting entrypoint.
Traceback (most recent call last):
File ".nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow-glue", line 7, in HA_MYBPC3
as dtype i64
at column 'Ref' (column number 2).
The current offset in the file is 166197506 bytes.
You might want to try:
infer_schema_length
(e.g. infer_schema_length=10000
),dtypes
argumentignore_errors
to True
,HA_MYBPC3
to the null_values
list.Work dir: running_nextflow/epi2me-labs-wf-aav-qc/work/e7/34d2d999245899d7179d699ad8b965
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
-- Check '.nextflow.log' file for details `` Thanks
Hi @KeelyDulmage and @dgoswamia
Could you try out the new version v1.0.3 (-r v1.0.1
) please and let me know if that fixes your issue.
Thanks,
Neil
@nrhorner Still getting the same error, I'm afraid:
nextflow run epi2me-labs/wf-aav-qc -r v1.0.1 --fastq NANO.fastq \ --ref_host Reference_files/Homo_sapiens.GRCh38.dna.primary_assembly.fa \ --ref_helper pHelper_kan.fa \ --ref_rep_cap RC9_Kan.fa \ --ref plasmid.fa \ --itr1_start 4537 \ --itr1_end 4663 \ --itr2_start 8898 \ --itr2_end 9024 \ -profile standard
This is the version that shows up: wf-aav-qc v1.0.1-g1731255, is that correct? It was v1.0.2 before...
ERROR ~ Error executing process > 'pipeline:aav_structures (1)'
Caused by: Process
pipeline:aav_structures (1)
terminated with an error exit status (1)Command executed:
export POLARS_MAX_THREADS=4 workflow-glue aav_structures --bam_info bam_info.tsv --itr_locations 4537 4663 8898 9024 --output_plot_data 'aav_structure_counts.tsv' --output_per_read 'NANO_aav_per_read_info.tsv' --sample_id "NANO" --transgene_plasmid_name "plasmid" --itr_fl_threshold 100 --itr_backbone_threshold 20 --symmetry_threshold 10
Command exit status: 1
Command output: (empty)
Command error: [16:15:58 - matplotlib.font_manager] generated new fontManager [16:15:59 - workflow_glue] Starting entrypoint. Traceback (most recent call last): File "/home/user/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow-glue", line 7, in
cli() File "/home/user/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow_glue/init.py", line 72, in cli args.func(args) File "/home/user/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow_glue/aav_structures.py", line 337, in main df_bam = pl.read_csv( File "/home/epi2melabs/conda/lib/python3.8/site-packages/polars/io/csv/functions.py", line 364, in read_csv df = pl.DataFrame._read_csv( File "/home/epi2melabs/conda/lib/python3.8/site-packages/polars/dataframe/frame.py", line 763, in _read_csv self._df = PyDataFrame.read_csv( exceptions.ComputeError: Could not parse X
as dtypei64
at column 'Ref' (column number 2). The current offset in the file is 645927 bytes.
Note, I did change the names of some of the input files above for proprietary reasons, my plasmid isn't actually named "plasmid.fa"!
Hello @Neil @.***>, I have tried as you suggested with the new version of wf-aav-qc nextflow version (v1.0.3) as follows and was able to resolve the previous issues of “Datatype inference error during CSV loading” but came across a new error the detail of the error is as follows, please advise on how to fix the terminated with an error exit status (137) for pipeline:medaka_consensus step
Common to run nextflow:
nextflow run epi2me-labs/wf-aav-qc \
-r v1.0.3 \
--fastq input_files/fastq_files/ \
--ref_host input_files/ref_host_files/Homo_sapiens.GRCh38.dna.toplevel.fa \
--ref_helper input_files/ref_helper_files/Helper_fiber_removed.fasta \
--ref_rep_cap input_files/ref_rep_cap_files/pTrans_Rep2_AAV9-PHP.eB.fasta \
--ref_transgene_plasmid input_files/ref_transgene_plasmid_files/ HA_MYBPC3.fasta \
--itr1_start 1375 \
--itr1_end 1519 \
--itr2_start 5937 \
--itr2_end 6081 \
-profile standard
Error on the workflow
ERROR ~ Error executing process > 'pipeline:medaka_consensus (1)'
Caused by:
Process pipeline:medaka_consensus (1)
terminated with an error exit status (137)
Command executed:
samtools view align.bam -bh " HA_MYBPC3" > transgene_reads.bam samtools index transgene_reads.bam
echo r1041_e82_400bps_sup_variant_g615
echo r1041_e82_400bps_sup_variant_g615
medaka consensus transgene_reads.bam "consensus_probs.hdf" --threads 2 --model r1041_e82_400bps_sup_variant_g615
medaka stitch --threads 2 consensus_probs.hdf transgene_plasmid.fa "fastq_files.transgene_plasmsid_consensus.fasta" bgzip "fastq_files.transgene_plasmsid_consensus.fasta"
medaka variant transgene_plasmid.fa consensus_probs.hdf "transgene_plasmid.vcf"
bcftools sort transgene_plasmid.vcf > "fastq_files.transgene_plasmsid_sorted.vcf.gz"
Command exit status: 137
Command output: r1041_e82_400bps_sup_variant_g615 r1041_e82_400bps_sup_variant_g615
Command error: [18:30:41 - Sampler] Initializing sampler for consensus of region KI270516.1:0-1300. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270389.1:0-1298. [18:30:41 - Sampler] Took 0.05s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270466.1:0-1233. [18:30:41 - Sampler] Took 0.05s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270388.1:0-1216. [18:30:41 - Sampler] Took 0.10s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270544.1:0-1202. [18:30:41 - Sampler] Took 0.04s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270310.1:0-1201. [18:30:41 - Sampler] Took 0.05s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270412.1:0-1179. [18:30:41 - Sampler] Took 0.07s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270395.1:0-1143. [18:30:41 - Sampler] Took 0.06s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270376.1:0-1136. [18:30:41 - Sampler] Took 0.04s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270337.1:0-1121. [18:30:41 - Sampler] Took 0.08s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270335.1:0-1048. [18:30:41 - Sampler] Took 0.06s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270378.1:0-1048. [18:30:41 - Sampler] Took 0.15s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270379.1:0-1045. [18:30:42 - Sampler] Took 1.26s to make features. [18:30:42 - Sampler] Initializing sampler for consensus of region KI270329.1:0-1040. [18:30:42 - Sampler] Took 1.28s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270419.1:0-1029. [18:30:43 - Sampler] Took 0.07s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270336.1:0-1026. [18:30:43 - Sampler] Took 0.06s to make features. [18:30:43 - Sampler] Took 0.07s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270312.1:0-998. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270539.1:0-993. [18:30:43 - Sampler] Took 0.02s to make features. [18:30:43 - Sampler] Took 0.04s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270385.1:0-990. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270423.1:0-981. [18:30:43 - Sampler] Took 0.04s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270392.1:0-971. [18:30:43 - Sampler] Took 0.03s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270394.1:0-970. [18:30:43 - Sampler] Took 0.07s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region HA_MYBPC3:0-9850. [18:30:43 - Sampler] Took 0.03s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region pTrans_Rep2_AAV9-PHP.eB:0-7467. [18:30:43 - Sampler] Took 0.04s to make features. [18:32:22 - Feature] Processed HA_MYBPC3:0.0-9849.0 (median depth 8084.0) [18:32:22 - Sampler] Took 99.15s to make features. .command.sh: line 11: 6619 Killed medaka consensus transgene_reads.bam "consensus_probs.hdf" --threads 2 --model r1041_e82_400bps_sup_variant_g615
Work dir: /mnt/efs/home/dgoswami/running_nextflow/epi2me-labs-wf-aav-qc/work/8c/9805556e17a61d3e50f60af373320b
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
-- Check '.nextflow.log' file for details
Thanks,
With Regards, Dharmendra
From: Neil Horner @.> Date: Wednesday, February 7, 2024 at 10:16 AM To: epi2me-labs/wf-aav-qc @.> Cc: Dharmendra Goswami @.>, Mention @.> Subject: Re: [epi2me-labs/wf-aav-qc] Chromosome or genome Reference possible type error (Issue #1)
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
Hi @KeelyDulmagehttps://github.com/KeelyDulmage and @dgoswamiahttps://github.com/dgoswamia
Could you try out the new version v1.0.3 (-r v1.0.1) please and let me know if that fixes your issue.
Thanks,
Neil
— Reply to this email directly, view it on GitHubhttps://github.com/epi2me-labs/wf-aav-qc/issues/1#issuecomment-1932263003, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARIJS6M35MVMGS3CZYZMYUDYSOLEPAVCNFSM6AAAAABCVOTC4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGI3DGMBQGM. You are receiving this because you were mentioned.Message ID: @.***>
@nrhorner Ah, I take it back. I tried to run with -r 1.0.3 before and got an error along the lines of it not being available, which is why I had run with -r 1.0.1 like in your post. But after seeing dgoswamia's post I cleaned everything up and tried running it again in a new folder with -r 1.0.3. AND SUCCESS. Ran through the entire pipeline with no errors!
@KeelyDulmage @dgoswamia Thanks for your feedback.
@dgoswamia please open another ticket for your unrelated issue and include a log file please.
Operating System
Ubuntu 22.04
Other Linux
No response
Workflow Version
v1.0.2-gbe7dd1f
Workflow Execution
Command line
EPI2ME Version
No response
CLI command run
nextflow run epi2me-labs/wf-aav-qc \ --fastq fastq/NANO.fastq \ --ref_host Reference_files/Homo_sapiens.GRCh38.dna.primary_assembly.fa \ --ref_helper pHelper_kan.fa \ --ref_rep_cap RC9_Kan.fa \ --ref_transgene_plasmid plasmid.fa \ --itr1_start 4537 \ --itr1_end 4663 \ --itr2_start 8898 \ --itr2_end 9024 \ -profile standard
Workflow Execution - CLI Execution Profile
standard (default)
What happened?
All processes successful except the pipeline:makeReport step. Had no issues previously running the demo sample.
Looks to me like the Ref column might be set to an integer type and need to be converted to a character type to accommodate sex chromosomes, but I'm not a python expert...
Relevant log output
Application activity log entry
No response