Open TakacsBertalan opened 1 year ago
UPDATE:
This seems to be an issue with multiple parts. Firstly, the CAMI samples were interleaved (read1 and read2 in the same file) and GOTTCHA doesn't seem to like that. After deinterleaving samples, I could run GOTTCHA on 15 of my 20 samples. Interestingly, for 5 samples, I got the same error, splittrim returned 0 reads. Every time I received the same error message as in this issue: https://github.com/LANL-Bioinformatics/GOTTCHA/issues/5
??:? onArrayIndexError [0x55f076006b1e] ??:? _d_arraybounds_indexp [0x55f075fe4dab] ??:? void splitrim.trimEntry(splitrim.inputOptions, in immutable(char)[], in immutable(char)[], in immutable(char)[], in immutable(char)[], ref ulong, ref ulong, ref ulong[ushort], ref ulong, ref immutable(char)[]) [0x55f075f669a7] ??:? void splitrim.parseFASTQ(in ulong, in ulong, ref splitrim.inputOptions, in ulong, std.stdio.File) [0x55f075f662a3] ??:? _Dmain [0x55f075f68919]
Just as a sanity check I checked the first few reads of my samples and they seem to be in order: "@DUMMY:1:DUMMY_FC:1:1:1:1 2:Y:0:A CGGCCTGATCGGTGATGGTGTGCCAGATGAACACCGGCGGCATGGTCGCGTCGACATGCTTCTCGATCGACAGCAGCTCGCGCAGCGACGCAACGGCTTTGCCGTCACCCAGCAGGTTGTCGAAACTGCCGCTGTACGCGCACCGCCCTG + DDDGEGGGEIIGIKH9JJJBHKKKIKHIKKKJKKEKJKHKKF@HKGHEKKE>GEJK6DHJCAE)DEFCIE?@EE;:E$$F=ECFDCFE)E$CD)EEE3ECA$'E4=FE=FDBDE?E9=E?$EDDEACDC=$E=$DA?D$CDCE$=$$A @DUMMY:1:DUMMY_FC:1:1:1:2 2:Y:0:A ATTTCAAATATATATTCTGAACTTGCCAGTTCCACTAATAAAGATGCTCAGATAATAGTAATTACAGGTAAAAACAGAAAATTATATGCAAAACTTATGTCTCTCAGTGAATTTTCCAATCTAGATACCAAAAGCCATGTTTTTATTAAA + DADE@GEEECIIEJKKGHH:KKJJJKHECK?K$KB=FGHHHEHJJAEKEH$ABFHIEKJGGKEFHEG$I:ECDII<$A,EE)EIEAE5=EEED?DAEABECEFEEEEE,F?E$EEE:DEA$D:DEEECF$,AE$$D$$B$$E$$D$EEC@ @DUMMY:1:DUMMY_FC:1:1:1:3 2:Y:0:A CTTTAATAAACACAAATGTATTTACTCTTTTAATGTTATCATGTTGTGCAAGTGCTGCTAAATCTCCAGTCATAACTCCATCATCAATTGTCTTTAATGAAGATTTTTCTAACTTATTTGCAAACTCAACCAATCCTTTATTATTATCTA + CCDGGGGGI3I$IJKK@K,DIKHHHKJKK4EEJKK>IHJ$H@$BJK$JK$J?J<E$$KGGKJJEGIE?@?AECBFI<FEGFE$EEEEECACB1?EE$B$?DAEEDEEEDCDBFEEE,DADEEEEEDEEEE$@$$$DEE:):EC4BBED$$ @DUMMY:1:DUMMY_FC:1:1:1:4 2:Y:0:A TTCTCCATCCTTAAAAGTAATTGAAAATTTTTCTATTAGATGTCCTCTGTAATTTAGTGGTTTGGAACTTACTACAATACCATTTACAGCTGTCATTTTACGTAATGTAAATACCTCATCTGTAGGTATTTTTGCAATCAACTCTACTTA + @DD2EEG9IIIDIJKJBK@HKKK=JJ<KJI2KH$KJEDJ?K$EEG8JHKJJJGGK4KKKEDKCFK)?;BGBKFECBEKEEEEBC=;D9)BED?F':$AAB$DAE$E3EEE$BDAEED$EDAD$DC$EDA$EBC,?DEB$@@EECAD$B? @DUMMY:1:DUMMY_FC:1:1:1:5 2:Y:0:A TGTAAATTTCATTGGTTATATTTGTGGGAGTTAACAGAGTTTTTTGACGGCTGTAATATTCAAAACTGTAATCCCTTATTCCAACGGATACTACGCTGATTTTCACATCATCTGTTCGCAGTCTTGCGGTTGCGTTTTCTGCCAAAGACA + CCDEGGECAI2IIKIEKJKIKKIDCAJJFJHK$HKIKEJ4KJIJGKKEEIICHEI?KADEE9EJ$KICEJ:?EFEDEIECE$FEDEECEBEDCCCDC?EE;E$EEE$CDEE?EE$=?EE19CDDEE$6$E$?D$G?E$E;EACD$CE??"
I suspect that the developers don't really care about this project anymore, still, I hope somebody will see this and can offer some help!
Thanks
Hi! I am trying to run GOTTCHA on a CAMI dataset (toy human microbiome) and I keep receiving the following error. $gottcha_new/bin/gottcha.pl --threads 11 --outdir gottcha_new/sajat_teszt --input /media/deltagene/Microbiome/CAMI_data/gastrooral_dir/sunbeam_output/qc/00_samples/sample2_anonymous_reads.fq --database gottcha/database/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.species [00:00:00] Starting GOTTCHA v1.0c [00:00:00] Auto set database level to SPECIES. [00:00:00] Number of threads: 11 [00:00:00] Checking running environment... [00:00:00] Done. All required scripts and tools found. [00:00:00] Split-trimming with parameters fixL=30, minQ=20, ascii=33. [00:00:00] Split-trimming: /media/deltagene/Microbiome/CAMI_data/gastrooral_dir/sunbeam_output/qc/00_samples/sample2_anonymous_reads.fq... [00:03:26] Done splitrimming /media/deltagene/Microbiome/CAMI_data/gastrooral_dir/sunbeam_output/qc/00_samples/sample2_anonymous_reads.fq. [00:03:26] Done merging splitrim stats.
Mean Read Length: 150 0 (0.00 %)
[00:03:26] Mapping split-trimmed reads to GOTTCHA database and profiling...
# of Processed Reads: 33,332,582 0 # of Mapped Reads: 0 0 (genome) # of Mapped Reads: 0 0 (plasmid only) # of Unmapped Reads: 33,332,582 0
[00:05:22] Done profiling mapping results.
0 taxanomy(ies) found.
[00:05:22] No read mapped to species-level signatures. Please try again with upper-level databases.
Running the same command on different samples (not from CAMI) always results in at least some split trimmed reads. What could be the problem here?
Thanks, Bertalan Takács