epi2me-labs / wf-aav-qc

Other
5 stars 8 forks source link

Chromosome or genome Reference possible type error #1

Closed KeelyDulmage closed 9 months ago

KeelyDulmage commented 9 months ago

Operating System

Ubuntu 22.04

Other Linux

No response

Workflow Version

v1.0.2-gbe7dd1f

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-aav-qc \ --fastq fastq/NANO.fastq \ --ref_host Reference_files/Homo_sapiens.GRCh38.dna.primary_assembly.fa \ --ref_helper pHelper_kan.fa \ --ref_rep_cap RC9_Kan.fa \ --ref_transgene_plasmid plasmid.fa \ --itr1_start 4537 \ --itr1_end 4663 \ --itr2_start 8898 \ --itr2_end 9024 \ -profile standard

Workflow Execution - CLI Execution Profile

standard (default)

What happened?

All processes successful except the pipeline:makeReport step. Had no issues previously running the demo sample.

Looks to me like the Ref column might be set to an integer type and need to be converted to a character type to accommodate sex chromosomes, but I'm not a python expert...

Relevant log output

Command exit status:
  1

Command output:
  (empty)

Command error:
  [17:40:23 - matplotlib.font_manager] generated new fontManager
  [17:40:24 - workflow_glue] Starting entrypoint.
  Traceback (most recent call last):
    File "/home/keelydulmage/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow-glue", line 7, in <module>
      cli()
    File "/home/keelydulmage/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow_glue/__init__.py", line 72, in cli
      args.func(args)
    File "/home/keelydulmage/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow_glue/aav_structures.py", line 337, in main
      df_bam = pl.read_csv(
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/polars/io/csv/functions.py", line 364, in read_csv
      df = pl.DataFrame._read_csv(
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/polars/dataframe/frame.py", line 763, in _read_csv
      self._df = PyDataFrame.read_csv(
  exceptions.ComputeError: Could not parse `X` as dtype `i64` at column 'Ref' (column number 2).
  The current offset in the file is 645927 bytes.

  You might want to try:
  - increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),
  - specifying correct dtype with the `dtypes` argument
  - setting `ignore_errors` to `True`,
  - adding `X` to the `null_values` list.

Application activity log entry

No response

nrhorner commented 9 months ago

Hi @KeelyDulmage

Thanks for raining this issue. Yes it looks like the datatypes need to be enforced when reading the CSV. Sorry for the inconvenience. I'll get a fix out for this as soo as possible.

dgoswamia commented 9 months ago

Hello @nrhorner Can you please help us fix this issue quickly and tell us an intermittent solution we can do to avoid this issue? with the datatypes that need to be enforced when reading the CSV. I am also getting similar errors as follows:

`` ERROR ~ Error executing process > 'pipeline:aav_structures (1)'

Caused by: Process pipeline:aav_structures (1) terminated with an error exit status (1)

Command executed:

export POLARS_MAX_THREADS=4 workflow-glue aav_structures --bam_info bam_info.tsv --itr_locations 1375 1519 5937 6081 --output_plot_data 'aav_structure_counts.tsv' --output_per_read 'fastq_files_aav_per_read_info.tsv' --sample_id "fastq_files" --transgene_plasmid_name "HA_MYBPC3" --itr_fl_threshold 100 --itr_backbone_threshold 20 --symmetry_threshold 10

Command exit status: 1

Command output: (empty)

Command error: [15:05:39 - matplotlib.font_manager] generated new fontManager [15:05:39 - workflow_glue] Starting entrypoint. Traceback (most recent call last): File ".nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow-glue", line 7, in cli() File ".nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow_glue/init.py", line 72, in cli args.func(args) File ".nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow_glue/aav_structures.py", line 337, in main df_bam = pl.read_csv( File "/home/epi2melabs/conda/lib/python3.8/site-packages/polars/io/csv/functions.py", line 364, in read_csv df = pl.DataFrame._read_csv( File "/home/epi2melabs/conda/lib/python3.8/site-packages/polars/dataframe/frame.py", line 763, in _read_csv self._df = PyDataFrame.read_csv( exceptions.ComputeError: Could not parse HA_MYBPC3 as dtype i64 at column 'Ref' (column number 2). The current offset in the file is 166197506 bytes.

You might want to try:

Work dir: running_nextflow/epi2me-labs-wf-aav-qc/work/e7/34d2d999245899d7179d699ad8b965

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details `` Thanks

nrhorner commented 9 months ago

Hi @KeelyDulmage and @dgoswamia

Could you try out the new version v1.0.3 (-r v1.0.1) please and let me know if that fixes your issue.

Thanks,

Neil

KeelyDulmage commented 9 months ago

@nrhorner Still getting the same error, I'm afraid:

nextflow run epi2me-labs/wf-aav-qc -r v1.0.1 --fastq NANO.fastq \ --ref_host Reference_files/Homo_sapiens.GRCh38.dna.primary_assembly.fa \ --ref_helper pHelper_kan.fa \ --ref_rep_cap RC9_Kan.fa \ --ref plasmid.fa \ --itr1_start 4537 \ --itr1_end 4663 \ --itr2_start 8898 \ --itr2_end 9024 \ -profile standard

This is the version that shows up: wf-aav-qc v1.0.1-g1731255, is that correct? It was v1.0.2 before...

ERROR ~ Error executing process > 'pipeline:aav_structures (1)'

Caused by: Process pipeline:aav_structures (1) terminated with an error exit status (1)

Command executed:

export POLARS_MAX_THREADS=4 workflow-glue aav_structures --bam_info bam_info.tsv --itr_locations 4537 4663 8898 9024 --output_plot_data 'aav_structure_counts.tsv' --output_per_read 'NANO_aav_per_read_info.tsv' --sample_id "NANO" --transgene_plasmid_name "plasmid" --itr_fl_threshold 100 --itr_backbone_threshold 20 --symmetry_threshold 10

Command exit status: 1

Command output: (empty)

Command error: [16:15:58 - matplotlib.font_manager] generated new fontManager [16:15:59 - workflow_glue] Starting entrypoint. Traceback (most recent call last): File "/home/user/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow-glue", line 7, in cli() File "/home/user/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow_glue/init.py", line 72, in cli args.func(args) File "/home/user/.nextflow/assets/epi2me-labs/wf-aav-qc/bin/workflow_glue/aav_structures.py", line 337, in main df_bam = pl.read_csv( File "/home/epi2melabs/conda/lib/python3.8/site-packages/polars/io/csv/functions.py", line 364, in read_csv df = pl.DataFrame._read_csv( File "/home/epi2melabs/conda/lib/python3.8/site-packages/polars/dataframe/frame.py", line 763, in _read_csv self._df = PyDataFrame.read_csv( exceptions.ComputeError: Could not parse X as dtype i64 at column 'Ref' (column number 2). The current offset in the file is 645927 bytes.

Note, I did change the names of some of the input files above for proprietary reasons, my plasmid isn't actually named "plasmid.fa"!

dgoswamia commented 9 months ago

Hello @Neil @.***>, I have tried as you suggested with the new version of wf-aav-qc nextflow version (v1.0.3) as follows and was able to resolve the previous issues of “Datatype inference error during CSV loading” but came across a new error the detail of the error is as follows, please advise on how to fix the terminated with an error exit status (137) for pipeline:medaka_consensus step

Common to run nextflow:

nextflow run epi2me-labs/wf-aav-qc \

-r v1.0.3 \

--fastq input_files/fastq_files/ \

--ref_host input_files/ref_host_files/Homo_sapiens.GRCh38.dna.toplevel.fa \

--ref_helper input_files/ref_helper_files/Helper_fiber_removed.fasta \

--ref_rep_cap input_files/ref_rep_cap_files/pTrans_Rep2_AAV9-PHP.eB.fasta \

--ref_transgene_plasmid input_files/ref_transgene_plasmid_files/ HA_MYBPC3.fasta \

--itr1_start 1375 \

--itr1_end 1519 \

--itr2_start 5937 \

--itr2_end 6081 \

-profile standard

Error on the workflow

ERROR ~ Error executing process > 'pipeline:medaka_consensus (1)'

Caused by: Process pipeline:medaka_consensus (1) terminated with an error exit status (137)

Command executed:

Extract reads mapping to transgene plasmid

samtools view align.bam -bh " HA_MYBPC3" > transgene_reads.bam samtools index transgene_reads.bam

echo r1041_e82_400bps_sup_variant_g615

echo r1041_e82_400bps_sup_variant_g615

medaka consensus transgene_reads.bam "consensus_probs.hdf" --threads 2 --model r1041_e82_400bps_sup_variant_g615

medaka stitch --threads 2 consensus_probs.hdf transgene_plasmid.fa "fastq_files.transgene_plasmsid_consensus.fasta" bgzip "fastq_files.transgene_plasmsid_consensus.fasta"

medaka variant transgene_plasmid.fa consensus_probs.hdf "transgene_plasmid.vcf"

bcftools sort transgene_plasmid.vcf > "fastq_files.transgene_plasmsid_sorted.vcf.gz"

Command exit status: 137

Command output: r1041_e82_400bps_sup_variant_g615 r1041_e82_400bps_sup_variant_g615

Command error: [18:30:41 - Sampler] Initializing sampler for consensus of region KI270516.1:0-1300. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270389.1:0-1298. [18:30:41 - Sampler] Took 0.05s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270466.1:0-1233. [18:30:41 - Sampler] Took 0.05s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270388.1:0-1216. [18:30:41 - Sampler] Took 0.10s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270544.1:0-1202. [18:30:41 - Sampler] Took 0.04s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270310.1:0-1201. [18:30:41 - Sampler] Took 0.05s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270412.1:0-1179. [18:30:41 - Sampler] Took 0.07s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270395.1:0-1143. [18:30:41 - Sampler] Took 0.06s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270376.1:0-1136. [18:30:41 - Sampler] Took 0.04s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270337.1:0-1121. [18:30:41 - Sampler] Took 0.08s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270335.1:0-1048. [18:30:41 - Sampler] Took 0.06s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270378.1:0-1048. [18:30:41 - Sampler] Took 0.15s to make features. [18:30:41 - Sampler] Initializing sampler for consensus of region KI270379.1:0-1045. [18:30:42 - Sampler] Took 1.26s to make features. [18:30:42 - Sampler] Initializing sampler for consensus of region KI270329.1:0-1040. [18:30:42 - Sampler] Took 1.28s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270419.1:0-1029. [18:30:43 - Sampler] Took 0.07s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270336.1:0-1026. [18:30:43 - Sampler] Took 0.06s to make features. [18:30:43 - Sampler] Took 0.07s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270312.1:0-998. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270539.1:0-993. [18:30:43 - Sampler] Took 0.02s to make features. [18:30:43 - Sampler] Took 0.04s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270385.1:0-990. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270423.1:0-981. [18:30:43 - Sampler] Took 0.04s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270392.1:0-971. [18:30:43 - Sampler] Took 0.03s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region KI270394.1:0-970. [18:30:43 - Sampler] Took 0.07s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region HA_MYBPC3:0-9850. [18:30:43 - Sampler] Took 0.03s to make features. [18:30:43 - Sampler] Initializing sampler for consensus of region pTrans_Rep2_AAV9-PHP.eB:0-7467. [18:30:43 - Sampler] Took 0.04s to make features. [18:32:22 - Feature] Processed HA_MYBPC3:0.0-9849.0 (median depth 8084.0) [18:32:22 - Sampler] Took 99.15s to make features. .command.sh: line 11: 6619 Killed medaka consensus transgene_reads.bam "consensus_probs.hdf" --threads 2 --model r1041_e82_400bps_sup_variant_g615

Work dir: /mnt/efs/home/dgoswami/running_nextflow/epi2me-labs-wf-aav-qc/work/8c/9805556e17a61d3e50f60af373320b

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

Thanks,

With Regards, Dharmendra

From: Neil Horner @.> Date: Wednesday, February 7, 2024 at 10:16 AM To: epi2me-labs/wf-aav-qc @.> Cc: Dharmendra Goswami @.>, Mention @.> Subject: Re: [epi2me-labs/wf-aav-qc] Chromosome or genome Reference possible type error (Issue #1)

[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.

Hi @KeelyDulmagehttps://github.com/KeelyDulmage and @dgoswamiahttps://github.com/dgoswamia

Could you try out the new version v1.0.3 (-r v1.0.1) please and let me know if that fixes your issue.

Thanks,

Neil

— Reply to this email directly, view it on GitHubhttps://github.com/epi2me-labs/wf-aav-qc/issues/1#issuecomment-1932263003, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARIJS6M35MVMGS3CZYZMYUDYSOLEPAVCNFSM6AAAAABCVOTC4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGI3DGMBQGM. You are receiving this because you were mentioned.Message ID: @.***>

KeelyDulmage commented 9 months ago

@nrhorner Ah, I take it back. I tried to run with -r 1.0.3 before and got an error along the lines of it not being available, which is why I had run with -r 1.0.1 like in your post. But after seeing dgoswamia's post I cleaned everything up and tried running it again in a new folder with -r 1.0.3. AND SUCCESS. Ran through the entire pipeline with no errors!

nrhorner commented 9 months ago

@KeelyDulmage @dgoswamia Thanks for your feedback.

@dgoswamia please open another ticket for your unrelated issue and include a log file please.