[Direct RNA Seq] Error comes up when I set --use_pychopper to FALSE

adhamzul commented 2 years ago

Hello!

I did direct RNA sequencing last week using MinION and I basecalled the data using Guppy's RNA config. I have the sample's full genome (fasta) and its annotation (GFF3) as well as 1.4 million basecalled reads in a single fastq.gz.

I first ran the workflow with the following command as a test.

nextflow run wf-isoforms --fastq input-rna/RNA.fastq.gz --ref_genome input-genome/genome.fasta --ref_annotation input-gff/annotations.gff --out_dir outdir/ -profile conda

The workflow completed successfully. But, when I check the transcriptomes assembled, there were only eight sequences. I figured that pychopper scans for primer used when generating cDNA and that is also why almost all of my reads were categorized as 'Unusable' in cdna_classifier_report.tsv.

When I ran the following command, I get an error saying that the workflow expected a cdna_classifier_report.tsv.

Command:

nextflow run wf-isoforms --use_pychopper false --fastq input-rna/RNA.fastq.gz --ref_genome input-genome/genome.fasta --ref_annotation input-gff/annotations.gff --out_dir outdir_2/ -profile conda Output:

executor >  local (8)
[4b/5121aa] process > start_ping:pingMessage (1)    [100%] 1 of 1 ✔
[05/0d66be] process > handleSingleFile (1)          [100%] 1 of 1 ✔
[e6/3ce736] process > pipeline:summariseReads (1)   [100%] 1 of 1 ✔
[67/7b30bb] process > pipeline:getVersions          [100%] 1 of 1 ✔
[cc/050715] process > pipeline:getParams            [100%] 1 of 1 ✔
[b7/f7fa04] process > pipeline:preprocess_reads (1) [100%] 1 of 1, failed: 1 ✘
[2f/e060c7] process > pipeline:build_minimap_index  [100%] 1 of 1 ✔
[-        ] process > pipeline:map_reads            -
[-        ] process > pipeline:split_bam            -
[-        ] process > pipeline:stringtie            -
[-        ] process > pipeline:merge_gff_bundles    -
[-        ] process > pipeline:run_gff_compare      -
[-        ] process > pipeline:run_gffread          -
[-        ] process > pipeline:makeReport           -
[-        ] process > output                        -
[5c/fde5f4] process > end_ping:pingMessage          [100%] 1 of 1 ✔
Error executing process > 'pipeline:preprocess_reads (1)'

Caused by:
  Missing output file(s) `cdna_classifier_report.tsv` expected by process `pipeline:preprocess_reads (1)`

Command executed:

  fastcat -s RNA -r RNA.stats -x RNA > input_reads.fq

  if [[ false == true ]];
  then
      cdna_classifier.py -t 4  input_reads.fq full_length_reads.fq
      generate_pychopper_stats.py --data cdna_classifier_report.tsv --output .
  else
      ln -s `realpath input_reads.fq` full_length_reads.fq
  fi

Command exit status:
  0

Command output:
  (empty)

Command error:
  Processing RNA/RNA.fastq.gz

Work dir:
  /home/RNA_sequencing/isoforms/work/b7/f7fa0473ad2fc221a6328bb544c636

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

I don't see any options to skip the preprocess_reads step and I am stuck.

Any ideas to what I am doing wrong? Any insights would be really great!

Adham

phpeters commented 2 years ago

Hej Adham,

The file "cdna_classifier_report.tsv " is not linked in the process preprocess_reads, when use_pychopper is set to false. You can add a ln to it after line 102 in main.nf to fix this.

Best! Philipp

adhamzul commented 2 years ago

Hello Philipp,

Thank you for the reply.

I am sorry but I don't really get what you mean...

I checked main.nf and line 102 is ln -srealpath input_reads.fqfull_length_reads.fq.


93    script:
94        """
95        fastcat -s ${sample_id} -r ${sample_id}.stats -x ${directory} > input_reads.fq
96    
97        if [[ ${params.use_pychopper} == true ]];
98        then
99            cdna_classifier.py -t ${params.threads} ${params.pychopper_opts} input_reads.fq full_length_reads.fq
100            generate_pychopper_stats.py --data cdna_classifier_report.tsv --output .
101        else
102            ln -s `realpath input_reads.fq` full_length_reads.fq
103        fi
104        """

Where do I need to insert ln exactly?

Regards, Adham

phpeters commented 2 years ago

Hej,

A rather hackish way to solve this is to clone the repo and run it locally. Then you can edit the main.nf file:

After the "include"-statements insert: mytsv=file(params.myTSV)
in the process preprocess_reads insert after the "ln -s realpath input_reads.fq full_length_reads.fq" command: ln -s ${mytsv} cdna_classifier_report.tsv

This will make the tsv available to the pipeline. Hope this helps. Philipp

epi2me-labs / wf-isoforms

[Direct RNA Seq] Error comes up when I set --use_pychopper to FALSE #1