LANL-Bioinformatics / GOTTCHA

Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Please visit our homepage:
http://lanl-bioinformatics.github.io/GOTTCHA
GNU General Public License v3.0
22 stars 6 forks source link

Splitrim returns 0 reads #18

Open TakacsBertalan opened 1 year ago

TakacsBertalan commented 1 year ago

Hi! I am trying to run GOTTCHA on a CAMI dataset (toy human microbiome) and I keep receiving the following error. $gottcha_new/bin/gottcha.pl --threads 11 --outdir gottcha_new/sajat_teszt --input /media/deltagene/Microbiome/CAMI_data/gastrooral_dir/sunbeam_output/qc/00_samples/sample2_anonymous_reads.fq --database gottcha/database/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.species [00:00:00] Starting GOTTCHA v1.0c [00:00:00] Auto set database level to SPECIES. [00:00:00] Number of threads: 11 [00:00:00] Checking running environment... [00:00:00] Done. All required scripts and tools found. [00:00:00] Split-trimming with parameters fixL=30, minQ=20, ascii=33. [00:00:00] Split-trimming: /media/deltagene/Microbiome/CAMI_data/gastrooral_dir/sunbeam_output/qc/00_samples/sample2_anonymous_reads.fq... [00:03:26] Done splitrimming /media/deltagene/Microbiome/CAMI_data/gastrooral_dir/sunbeam_output/qc/00_samples/sample2_anonymous_reads.fq. [00:03:26] Done merging splitrim stats.

                                RAW         SPLIT-TRIMMED  
                                ===         =============  
  # of Reads:            33,332,582                     0  (0.00 %)
  # of Bases:         4,999,887,300                     0  (0.00 %)

Mean Read Length: 150 0 (0.00 %)

[00:03:26] Mapping split-trimmed reads to GOTTCHA database and profiling...

                                   RAW         SPLIT-TRIMMED
                         =============         =============

# of Processed Reads: 33,332,582 0 # of Mapped Reads: 0 0 (genome) # of Mapped Reads: 0 0 (plasmid only) # of Unmapped Reads: 33,332,582 0

[00:05:22] Done profiling mapping results.

0 taxanomy(ies) found.

[00:05:22] No read mapped to species-level signatures. Please try again with upper-level databases.

Running the same command on different samples (not from CAMI) always results in at least some split trimmed reads. What could be the problem here?

Thanks, Bertalan Takács

TakacsBertalan commented 1 year ago

UPDATE:

This seems to be an issue with multiple parts. Firstly, the CAMI samples were interleaved (read1 and read2 in the same file) and GOTTCHA doesn't seem to like that. After deinterleaving samples, I could run GOTTCHA on 15 of my 20 samples. Interestingly, for 5 samples, I got the same error, splittrim returned 0 reads. Every time I received the same error message as in this issue: https://github.com/LANL-Bioinformatics/GOTTCHA/issues/5

I tried to run splittrim separately and each run resulted in the same message, except when I tried to modify the --minQ parameter. This resulted in the following error: ----> ENTRY HEADER:@DUMMY:1:DUMMY_FC:1:1:1:1 1:Y:0:A Threads: 1 (effective) 1 (requested) IDX = 0: reading from 0 to 5760994150 core.exception.ArrayIndexError@splitrim.d(758): index [150] is out of bounds for array of length 150

??:? onArrayIndexError [0x55f076006b1e] ??:? _d_arraybounds_indexp [0x55f075fe4dab] ??:? void splitrim.trimEntry(splitrim.inputOptions, in immutable(char)[], in immutable(char)[], in immutable(char)[], in immutable(char)[], ref ulong, ref ulong, ref ulong[ushort], ref ulong, ref immutable(char)[]) [0x55f075f669a7] ??:? void splitrim.parseFASTQ(in ulong, in ulong, ref splitrim.inputOptions, in ulong, std.stdio.File) [0x55f075f662a3] ??:? _Dmain [0x55f075f68919]

Just as a sanity check I checked the first few reads of my samples and they seem to be in order: "@DUMMY:1:DUMMY_FC:1:1:1:1 2:Y:0:A CGGCCTGATCGGTGATGGTGTGCCAGATGAACACCGGCGGCATGGTCGCGTCGACATGCTTCTCGATCGACAGCAGCTCGCGCAGCGACGCAACGGCTTTGCCGTCACCCAGCAGGTTGTCGAAACTGCCGCTGTACGCGCACCGCCCTG + DDDGEGGGEIIGIKH9JJJBHKKKIKHIKKKJKKEKJKHKKF@HKGHEKKE>GEJK6DHJCAE)DEFCIE?@EE;:E$$F=ECFDCFE)E$CD)EEE3ECA$'E4=FE=FDBDE?E9=E?$EDDEACDC=$E=$DA?D$CDCE$=$$A @DUMMY:1:DUMMY_FC:1:1:1:2 2:Y:0:A ATTTCAAATATATATTCTGAACTTGCCAGTTCCACTAATAAAGATGCTCAGATAATAGTAATTACAGGTAAAAACAGAAAATTATATGCAAAACTTATGTCTCTCAGTGAATTTTCCAATCTAGATACCAAAAGCCATGTTTTTATTAAA + DADE@GEEECIIEJKKGHH:KKJJJKHECK?K$KB=FGHHHEHJJAEKEH$ABFHIEKJGGKEFHEG$I:ECDII<$A,EE)EIEAE5=EEED?DAEABECEFEEEEE,F?E$EEE:DEA$D:DEEECF$,AE$$D$$B$$E$$D$EEC@ @DUMMY:1:DUMMY_FC:1:1:1:3 2:Y:0:A CTTTAATAAACACAAATGTATTTACTCTTTTAATGTTATCATGTTGTGCAAGTGCTGCTAAATCTCCAGTCATAACTCCATCATCAATTGTCTTTAATGAAGATTTTTCTAACTTATTTGCAAACTCAACCAATCCTTTATTATTATCTA + CCDGGGGGI3I$IJKK@K,DIKHHHKJKK4EEJKK>IHJ$H@$BJK$JK$J?J<E$$KGGKJJEGIE?@?AECBFI<FEGFE$EEEEECACB1?EE$B$?DAEEDEEEDCDBFEEE,DADEEEEEDEEEE$@$$$DEE:):EC4BBED$$ @DUMMY:1:DUMMY_FC:1:1:1:4 2:Y:0:A TTCTCCATCCTTAAAAGTAATTGAAAATTTTTCTATTAGATGTCCTCTGTAATTTAGTGGTTTGGAACTTACTACAATACCATTTACAGCTGTCATTTTACGTAATGTAAATACCTCATCTGTAGGTATTTTTGCAATCAACTCTACTTA + @DD2EEG9IIIDIJKJBK@HKKK=JJ<KJI2KH$KJEDJ?K$EEG8JHKJJJGGK4KKKEDKCFK)?;BGBKFECBEKEEEEBC=;D9)BED?F':$AAB$DAE$E3EEE$BDAEED$EDAD$DC$EDA$EBC,?DEB$@@EECAD$B? @DUMMY:1:DUMMY_FC:1:1:1:5 2:Y:0:A TGTAAATTTCATTGGTTATATTTGTGGGAGTTAACAGAGTTTTTTGACGGCTGTAATATTCAAAACTGTAATCCCTTATTCCAACGGATACTACGCTGATTTTCACATCATCTGTTCGCAGTCTTGCGGTTGCGTTTTCTGCCAAAGACA + CCDEGGECAI2IIKIEKJKIKKIDCAJJFJHK$HKIKEJ4KJIJGKKEEIICHEI?KADEE9EJ$KICEJ:?EFEDEIECE$FEDEECEBEDCCCDC?EE;E$EEE$CDEE?EE$=?EE19CDDEE$6$E$?D$G?E$E;EACD$CE??"

I suspect that the developers don't really care about this project anymore, still, I hope somebody will see this and can offer some help!

Thanks