biocorecrg / master_of_pores

Nextflow pipeline for analysis of direct RNA Nanopore reads
https://biocorecrg.github.io/master_of_pores/
MIT License
88 stars 16 forks source link

Zero Division Error #95

Closed koushikmuralidharan closed 3 years ago

koushikmuralidharan commented 3 years ago

Hello, Hope you're doing well. Was running the pipeline and ran into the following error. Other libraries seem to work fine. Looked at a previous version of this error, and saw that the mopprepr version was at 0.6. Any help would be fantastic! -Koushik

╔╦╗┌─┐┌─┐┌┬┐┌─┐┬─┐ ┌─┐┌─┐ ╔═╗╔═╗╦═╗╔═╗╔═╗ ║║║├─┤└─┐ │ ├┤ ├┬┘ │ │├┤ ╠═╝║ ║╠╦╝║╣ ╚═╗ ╩ ╩┴ ┴└─┘ ┴ └─┘┴└─ └─┘└ ╩ ╚═╝╩╚═╚═╝╚═╝

==================================================== BIOCORE@CRG Preprocessing of Nanopore direct RNA - N F ~ version 0.1

kit : SQK-RNA001 flowcell : FLO-MIN106 fast5 : /efs/wt_20021/*.fast5 reference : /home/ubuntu/ONT_Balaj/master_of_pores/NanoPreprocess/../anno/reference.fasta.gz annotation : /home/ubuntu/ONT_Balaj/master_of_pores/NanoPreprocess/../anno/annotation.gtf.gz

ref_type : transcriptome seq_type : RNA

output : /efs/results/wt_20021 qualityqc : 5 granularity :

basecaller : guppy basecaller_opt : GPU : ON demultiplexing : OFF demultiplexing_opt : -m pAmps-final-actrun_newdata_nanopore_UResNet20v2_model.030.h5

filter : filter_opt : mapper : minimap2 mapper_opt : -uf -k14 map_type : spliced

counter : YES counter_opt :

email : executor > local (300) [84/0dcac1] process > testInput (FAP93780_skip_32f57ee7_98.fast5) [100%] 1 of 1 ✔ [11/743a2d] process > baseCalling (guppy-wt_20021-290) [100%] 291 of 291 ✔ [17/d81233] process > concatenateFastQFiles (wt_20021) [100%] 1 of 1 ✔ [63/baa646] process > QC (wt_20021) [100%] 1 of 1 ✔ [c7/9bae09] process > fastQC (wt_20021) [100%] 1 of 1 ✔ [7f/dde5d1] process > mapping (minimap2-wt_20021) [100%] 1 of 1 ✔ [dc/010aae] process > counting (wt_20021) [ 0%] 0 of 1 [- ] process > joinCountQCs - executor > local (300) [84/0dcac1] process > testInput (FAP93780_skip_32f57ee7_98.fast5) [100%] 1 of 1 ✔ [11/743a2d] process > baseCalling (guppy-wt_20021-290) [100%] 291 of 291 ✔ [17/d81233] process > concatenateFastQFiles (wt_20021) [100%] 1 of 1 ✔ [63/baa646] process > QC (wt_20021) [100%] 1 of 1 ✔ [c7/9bae09] process > fastQC (wt_20021) [100%] 1 of 1 ✔ [7f/dde5d1] process > mapping (minimap2-wt_20021) [100%] 1 of 1 ✔ [dc/010aae] process > counting (wt_20021) [100%] 1 of 1, failed: 1 ✘ [- ] process > joinCountQCs - [c0/b66136] process > alnQC (wt_20021) [100%] 1 of 1 ✔ [d3/6d0266] process > joinAlnQCs [100%] 1 of 1 ✔ [- ] process > alnQC2 (wt_20021) - [- ] process > multiQC - Error executing process > 'counting (wt_20021)'

Caused by: Process counting (wt_20021) terminated with an error exit status (1)

Command executed:

NanoCount -i wt_20021.minimap2.sorted.bam -o wt_20021.count ; awk '{sum+=$3}END{print FILENAME" "sum}' wt_20021.count |sed s@.count@@g > wt_20021.stats samtools view -F 256 wt_20021.minimap2.sorted.bam |cut -f 1,3 > wt_20021.assigned

Command exit status: 1

Command output: (empty)

Command error:

Checking options and input files

Initialise Nanocount

    Parse Bam file and filter low quality hits

Traceback (most recent call last): File "/usr/local/python/versions/3.6.3/bin/NanoCount", line 8, in sys.exit(main()) File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/main.py", line 41, in main nanocount = nc (**vars(args)) File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 81, in init self.read_dict = self._parse_bam () File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 175, in _parse_bam if self.scoring_value == "alignment_score" and hit.align_score/best_hit.align_score < self.equivalent_threshold: ZeroDivisionError: division by zero

Work dir: /home/ubuntu/ONT_Balaj/master_of_pores/NanoPreprocess/work/dc/010aae18d6821ea4a82e3d01d26b3d

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out

lucacozzuto commented 3 years ago

Hi, can you check if the bam file wt_20021.minimap2.sorted.bam is ok? I think a samtools stats will give us this info. Another thing is reference.fasta.gz containing the reference transcript or genomic sequences?

Luca

koushikmuralidharan commented 3 years ago

reference.fasta.gz is transcript sequences from ensembl. Summary Numbers from samtools stats below:

SN raw total sequences: 939166 SN filtered sequences: 0 SN sequences: 939166 SN is sorted: 1 SN 1st fragments: 939166 SN last fragments: 0 SN reads mapped: 939166 SN reads mapped and paired: 0 # paired-end technology bit set + both mates mapped SN reads unmapped: 0 SN reads properly paired: 0 # proper-pair bit set SN reads paired: 0 # paired-end technology bit set SN reads duplicated: 0 # PCR or optical duplicate bit set SN reads MQ0: 420544 # mapped and MQ=0 SN reads QC failed: 0 SN non-primary alignments: 2348176 SN total length: 765107872 # ignores clipping SN bases mapped: 765107872 # ignores clipping SN bases mapped (cigar): 715834398 # more accurate SN bases trimmed: 0 SN bases duplicated: 0 SN mismatches: 83188577 # from NM fields SN error rate: 1.162120e-01 # mismatches / bases mapped (cigar) SN average length: 814 SN maximum length: 44154 SN average quality: 17.3 SN insert size average: 0.0 SN insert size standard deviation: 0.0 SN inward oriented pairs: 0 SN outward oriented pairs: 0 SN pairs with other orientation: 0 SN pairs on different chromosomes: 0

lucacozzuto commented 3 years ago

Ok. So which mopprepr version are you using? in the 0.6 you should have NanoCount v0.2.0 that fixed that error. Are you using docker or singularity?

koushikmuralidharan commented 3 years ago

0.6, and I am using docker.

from nextflow.global.config:

memory='12G' cache='lenient' container = 'biocorecrg/mopprepr:0.6' containerOptions = { workflow.containerEngine == "docker" ? '-u $(id -u):$(id -g)': null} withLabel: big_cpus { cpus = 8 memory = '12G'

lucacozzuto commented 3 years ago

Can you check the version of the tool in the docker image? If is 0.2.0 we should open a ticket with NanoCount

koushikmuralidharan commented 3 years ago

Within the docker image, it says 0.6.

ubuntu@ip-172-31-79-195:/efs/master_of_pores$ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE hello-world latest d1165f221234 4 weeks ago 13.3kB biocorecrg/mopprepr 0.6 193066e4aae3 11 months ago 6.01GB biocorecrg/mopbasecall 0.1 bb72653c6b98 16 months ago 1.15GB

koushikmuralidharan commented 3 years ago

Hi, got counts by running NanoCount separately. Not sure exactly what the error was, but we only saw it in this one library. Thank you for your help though.

One unrelated question: can we still run Nanomod on the output folder as long as the folder contains the fast5 and bam files? And can NanoMod be accelerated by GPU?

lucacozzuto commented 3 years ago

Hi, NanoMod also need another file generated using the output of NanoCount. It generates the *.assigned file. So what you need is to generate it by using this code replacing idfile with the id you are using.

awk '{sum+=\$3}END{print FILENAME"\t"sum}' ${idfile}.count |sed s@.count@@g > ${idfile}.stats
    samtools view -F 256 ${bamfile} |cut -f 1,3 > ${idfile}.assigned

We are currently developing a new faster NanoMod. So we hope to have some major upgrade during this year.

PS: about the version of NanoCount in the docker you can check it by running this:

docker pull biocorecrg/mopprepr:0.6
docker run --name mopprepr --rm -i -t biocorecrg/mopprepr:0.6 bash
NanoCount --version

If you get NanoCount version 0.2.0 is fine. Btw, which version of NanoCount worked for you?

Luca

koushikmuralidharan commented 3 years ago

Got it, will run the first block and keep you posted. We got 0.2.0, still not sure what the error was. I think we used 0.2.4. -Koushik