biocorecrg / master_of_pores

Nextflow pipeline for analysis of direct RNA Nanopore reads
https://biocorecrg.github.io/master_of_pores/
MIT License
92 stars 16 forks source link

NanoCount error with assign fast5 #60

Closed callumparr closed 4 years ago

callumparr commented 4 years ago

The pipeline was working well for one library and then repeated on another library and then an error came during the assign fast5 creation from the sorted alignment file. I can see the BAM file looks OK.

(base) callum@dgt-gpu1:~/master_of_pores/NanoPreprocess$ nextflow run nanopreprocess.nf -with-singularity -resume N E X T F L O W ~ version 20.01.0 Launchingnanopreprocess.nf` [friendly_mirzakhani] - revision: 75e38aee97 ╔╦╗┌─┐┌─┐┌┬┐┌─┐┬─┐ ┌─┐┌─┐ ╔═╗╔═╗╦═╗╔═╗╔═╗ ║║║├─┤└─┐ │ ├┤ ├┬┘ │ │├┤ ╠═╝║ ║╠╦╝║╣ ╚═╗ ╩ ╩┴ ┴└─┘ ┴ └─┘┴└─ └─┘└ ╩ ╚═╝╩╚═╚═╝╚═╝

==================================================== BIOCORE@CRG Preprocessing of Nanopore direct RNA - N F ~ version 0.1

kit : SQK-RNA001 flowcell : FLO-MIN106 fast5 : /analysisdata/rawseq/bcl/callum/Mouse_aging/tmp/Day2_09_DRS_gzip/20200222_0821_MN22588_FAL86574_47edbd96/fast5_pass/*.fast5 reference : /home/callum/transcriptome_tutorial/Analysis/ReferenceData/Mus_musculus.GRCm38.cdna.all.fa annotation :

ref_type : transcriptome seq_type : RNA

output : /analysisdata/rawseq/bcl/callum/Mouse_aging/tmp/Day2_09_DRS_gzip/NanoPreprocess_out qualityqc : 7 granularity :

basecaller : guppy basecaller_opt : GPU : ON demultiplexing :
demultiplexing_opt : -m pAmps-final-actrun_newdata_nanopore_UResNet20v2_model.030.h5

filter : filter_opt : mapper : minimap2 mapper_opt : -uf -k14 map_type : unspliced

counter : YES counter_opt :

email : callum.parr@riken.jp executor > local (1) [06/602027] process > testInput [100%] 1 of 1, cached: 1 ✔ [f0/7968a9] process > baseCalling [100%] 401 of 401, cached: 401 ✔ [c5/f86746] process > concatenateFastQFiles [100%] 1 of 1, cached: 1 ✔ [9a/8da61f] process > QC [100%] 1 of 1, cached: 1 ✔ [b9/e3d27c] process > fastQC [100%] 1 of 1, cached: 1 ✔ [e4/c7f0db] process > mapping [100%] 1 of 1, cached: 1 ✔ [3f/71df99] process > counting [ 0%] 0 of 1 executor > local (1) [06/602027] process > testInput [100%] 1 of 1, cached: 1 ✔ [f0/7968a9] process > baseCalling [100%] 401 of 401, cached: 401 ✔ [c5/f86746] process > concatenateFastQFiles [100%] 1 of 1, cached: 1 ✔ [9a/8da61f] process > QC [100%] 1 of 1, cached: 1 ✔ [b9/e3d27c] process > fastQC [100%] 1 of 1, cached: 1 ✔ [e4/c7f0db] process > mapping [100%] 1 of 1, cached: 1 ✔ [3f/71df99] process > counting [100%] 1 of 1, failed: 1 ✘ Pipeline BIOCORE@CRG Master of Pore completed! Started at 2020-05-01T15:30:00.980+09:00 [100%] 1 of 1, cached: 1 ✔ executor > local (1) [06/602027] process > testInput [100%] 1 of 1, cached: 1 ✔ [f0/7968a9] process > baseCalling [100%] 401 of 401, cached: 401 ✔ [c5/f86746] process > concatenateFastQFiles [100%] 1 of 1, cached: 1 ✔ [9a/8da61f] process > QC [100%] 1 of 1, cached: 1 ✔ [b9/e3d27c] process > fastQC [100%] 1 of 1, cached: 1 ✔ [e4/c7f0db] process > mapping [100%] 1 of 1, cached: 1 ✔ [3f/71df99] process > counting [100%] 1 of 1, failed: 1 ✘ [- ] process > joinCountQCs - [bd/af7981] process > alnQC [100%] 1 of 1, cached: 1 ✔ [be/54255f] process > joinAlnQCs [100%] 1 of 1, cached: 1 ✔ [76/b31c71] process > alnQC2 [100%] 1 of 1, cached: 1 ✔ executor > local (1) [06/602027] process > testInput [100%] 1 of 1, cached: 1 ✔ [f0/7968a9] process > baseCalling [100%] 401 of 401, cached: 401 ✔ [c5/f86746] process > concatenateFastQFiles [100%] 1 of 1, cached: 1 ✔ [9a/8da61f] process > QC [100%] 1 of 1, cached: 1 ✔ [b9/e3d27c] process > fastQC [100%] 1 of 1, cached: 1 ✔ [e4/c7f0db] process > mapping [100%] 1 of 1, cached: 1 ✔ [3f/71df99] process > counting [100%] 1 of 1, failed: 1 ✘ [- ] process > joinCountQCs - [bd/af7981] process > alnQC [100%] 1 of 1, cached: 1 ✔ [be/54255f] process > joinAlnQCs [100%] 1 of 1, cached: 1 ✔ [76/b31c71] process > alnQC2 [100%] 1 of 1, cached: 1 ✔ [- ] process > multiQC [ 0%] 0 of 1 Error executing process > 'counting (fast5_pass)'

Caused by: Process counting (fast5_pass) terminated with an error exit status (1)

Command executed:

NanoCount -i fast5_pass.minimap2.sorted.bam -o fast5_pass.count ; awk '{sum+=$3}END{print FILENAME"s/3.6"sum}' fast5_pass.count |sed s@.count@@g > fast5_pass.stats samtools view -F 256 fast5_pass.minimap2.sorted.bam |cut -f 1,3 > fast5_pass.assigned

Command exit status: 1

Command output: (empty)

Command error: Parse Bam file and filter low quality hits Traceback (most recent call last): File "/usr/local/python/versions/3.6.3/bin/NanoCount", line 8, in sys.exit(main()) File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/main.py", line 48, in main verbose =args.verbose) File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 67, in init self.read_dict = self._parse_bam () File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 163, in _parse_bam if self.scoring_value == "alignment_score" and hit.align_score/best_hit.align_score < self.equivalent_threshold: ZeroDivisionError: division by zero

Work dir: /home/callum/master_of_pores/NanoPreprocess/work/3f/71df9957d8b2151cbe1af158b1896f

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line Failed to invoke workflow.onComplete event handler

-- Check script 'nanopreprocess.nf' at line: 673 or see '.nextflow.log' file for more details `

From the container running bash .command.run

` Parse Bam file and filter low quality hits Traceback (most recent call last): File "/usr/local/python/versions/3.6.3/bin/NanoCount", line 8, in sys.exit(main()) File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/main.py", line 48, in main verbose =args.verbose) File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 67, in init self.read_dict = self._parse_bam () File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 163, in _parse_bam if self.scoring_value == "alignment_score" and hit.align_score/best_hit.align_score < self.equivalent_threshold: ZeroDivisionError: division by zero

`

lucacozzuto commented 4 years ago

Hi, it looks like there is something went wrong with NanoCount. Can you go in the folder /home/callum/master_of_pores/NanoPreprocess/work/3f/71df9957d8b2151cbe1af158b1896f and check the bam etc? Maybe is something to redirect to them.

L

callumparr commented 4 years ago
 ls -lhat master_of_pores/NanoPreprocess/work/3f/71df9957d8b2151cbe1af158b1896f/
total 320K
-rw-r--r-- 1 callum genome    1 May  1 17:15 .exitcode
-rw-r--r-- 1 callum genome  739 May  1 17:15 .command.err
-rw-r--r-- 1 callum genome    0 May  1 17:14 .command.out
drwxr-xr-x 2 callum genome 4.0K May  1 17:14 .
lrwxrwxrwx 1 callum genome  113 May  1 17:14 fast5_pass.minimap2.sorted.bam -> /home/callum/master_of_pores/NanoPreprocess/work/e4/c7f0dba2aae1507ddf9ff7e5bfd818/fast5_pass.minimap2.sorted.bam
-rw-r--r-- 1 callum genome    0 May  1 17:14 .command.begin
-rw-r--r-- 1 callum genome  739 May  1 15:31 .command.log
-rw-r--r-- 1 callum genome  262 May  1 15:30 .command.sh
-rw-r--r-- 1 callum genome 3.2K May  1 15:30 .command.run
drwxr-xr-x 4 callum genome 4.0K May  1 15:30 ..

Example of the BAM .

example.txt

lucacozzuto commented 4 years ago

you can do a samtools flagstat of the bam. and then if is ok

singularity exec -e path_of_the_image NanoCount -i fast5_pass.minimap2.sorted.bam -o fast5_pass.count

at that point if it fails we should send and issue to NanoCount developers

callumparr commented 4 years ago

output from samtools flagstat in.bam > out.flagstat

3148715 + 0 in total (QC-passed reads + QC-failed reads)
1676236 + 0 secondary
82251 + 0 supplementary
0 + 0 duplicates
3148715 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Where can I find the location of the images for singularity. I am not so familiar with this. Is there an equivalent docker system df command?

lucacozzuto commented 4 years ago

In the root of master_of_pores you should have a folder with all the images. The one you want it should be biocorecrg-mopprepr*.img

callumparr commented 4 years ago

(base) callum@dgt-gpu1:~/master_of_pores/NanoPreprocess$ singularity exec -e ../singularity/biocorecrg-mopprepr-0.2.img NanoCount work/3f/71df9957d8b2151cbe1af158b1896f/fast5_pass.minimap2.sorted.bam -o work/3f/71df9957d8b2151cbe1af158b1896f/fast5_pass.count


  File "/usr/local/python/versions/3.6.3/bin/NanoCount", line 5, in <module>
    from NanoCount.__main__ import main
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/__main__.py", line 14, in <module>
    from NanoCount.NanoCount import NanoCount as nc
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 10, in <module>
    import pandas as pd
  File "/home/callum/miniconda3/lib/python3.6/site-packages/pandas/__init__.py", line 42, in <module>
    from pandas.core.api import *
  File "/home/callum/miniconda3/lib/python3.6/site-packages/pandas/core/api.py", line 26, in <module>
    from pandas.core.groupby import Grouper
  File "/home/callum/miniconda3/lib/python3.6/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.groupby import GroupBy  # noqa: F401
  File "/home/callum/miniconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 37, in <module>
    from pandas.core.frame import DataFrame
  File "/home/callum/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 100, in <module>
    from pandas.core.series import Series
  File "/home/callum/miniconda3/lib/python3.6/site-packages/pandas/core/series.py", line 4386, in <module>
    Series._add_series_or_dataframe_operations()
  File "/home/callum/miniconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 10138, in _add_series_or_dataframe_operations
    from pandas.core import window as rwindow
  File "/home/callum/miniconda3/lib/python3.6/site-packages/pandas/core/window.py", line 14, in <module>
    import pandas._libs.window as libwindow
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by `/home/callum/miniconda3/lib/python3.6/site-packages/pandas/_libs/window.cpython-36m-x86_64-linux-gnu.so)```

I tried this also on the bam file for the library that ran through to completion for NanoPreprocess and it gave same error.
lucacozzuto commented 4 years ago

can you change in the file master_of_pores/nextflow.global.config the version of the container? from container = biocorecrg/mopprepr:0.2

to container = 'biocorecrg/mopprepr:0.4' I upgraded twice this container, so maybe it is time to upgrade in the master too.

callumparr commented 4 years ago

do I need to manually pull the image to the master_of_pores/singularity ?

lucacozzuto commented 4 years ago

No. If you change that line of code it will do automatically for you.

L

Il sab 2 mag 2020, 16:52 callumparr notifications@github.com ha scritto:

do I need to manually pull the image to the master_of_pores/singularity ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocorecrg/master_of_pores/issues/60#issuecomment-622965867, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZ5FPI67MZUX3AGMY4FI4TRPQXTNANCNFSM4MW6KTRQ .

callumparr commented 4 years ago

Updated and reran nextflow run nanopreprocess.nf with the -resume flag but it seems not to have cache so has to rebasecall. Will let you know if it progress past the nanocount.

When I try the singularity with exec it had same error.

callumparr commented 4 years ago

Thanks to the authors of NanoCount it is now able to work through this BAM file. Is there a skip option for running through NanoPreprocess as I no longer have the cache for this library but I have the output events fast5s, fastq and so on.

lucacozzuto commented 4 years ago

ouch no... sorry. So what was the problem? Can you link here the issue?

callumparr commented 4 years ago

https://github.com/a-slide/NanoCount/issues/5#issue-611231896

I think there is something weird about my BAM than it is a bug. I ran another library and completed fine

In this case I mapped direct RNA reads to the ensembl cDNA transcriptome fasta (release 99). I guess there is nothing wrong.

lucacozzuto commented 4 years ago

ok thanks

callumparr commented 4 years ago

Sorry for another question.

I can manually generate the count file but how may I get singularity image to use the latest version NanoCount. By singularity exec -e image NanoCount....

If I run the nextflow ith -resume it still gives the same error as before so I guess it is using an older version of NanoCount

UPDATE:

I added the home path directory to the NanoCount I installed and bash .command.run now uses v0.2.1. Then running nextflow with -resume completes all the way to the end.

Can I add this other install directory path to the nanopreprocessing.nf so it always loads this into the command script when creating the work containers?

lucacozzuto commented 4 years ago

Hi, I added a new container with updated nanocount. You can do a git pull for changing the nextflow.global.config.

callumparr commented 4 years ago

Hi, I added a new container with updated nanocount. You can do a git pull for changing the nextflow.global.config.

Thank you very much!