Closed rishibhandari63 closed 2 years ago
Hello,
Could you share the first few lines of the kaiju file? I'm thinking there might be issues with how the fasta headers in that file are represented.
Thanks, Seth
My kaiju file looks like
C A00201R:540:HTWLMDSX2:4:1101:25391:1501:N:0:CTAGATTGCG+CGCCATATCT#0 98 UniRef100_V5FSW9, C A00201R:540:HTWLMDSX2:4:1101:5954:1595:N:0:CTAGATTGCG+CGCCATATCT#0 155 UniRef100_T1MW98, C A00201R:540:HTWLMDSX2:4:1101:8088:1376:N:0:CTAGATTGCG+CGCCATATCT#0 66 UniRef100_L8H989, C A00201R:540:HTWLMDSX2:4:1101:23086:1297:N:0:CTAGATTGCG+CGCCATATCT#0 130 UniRef100_A0A444BV45, C A00201R:540:HTWLMDSX2:4:1101:11315:1329:N:0:CTAGATTGCG+CGCCATATCT#0 66 UniRef100_A0A2N1L258,UniRef100_A0A6V8QKA9,UniRef100_A0A395NZ34, C A00201R:540:HTWLMDSX2:4:1101:13711:2440:N:0:CTAGATTGCG+CGCCATATCT#0 95 UniRef100_A0A182WSH2,UniRef100_A0A182QM46,UniRef100_A0A182VZB5,UniRef 100_A0A182NS49,UniRef100_A0A182FBP3, C A00201R:540:HTWLMDSX2:4:1101:29116:3443:N:0:CTAGATTGCG+CGCCATATCT#0 72 UniRef100_A0A409XNK3,UniRef100_A0A409VTK7, C A00201R:540:HTWLMDSX2:4:1101:8006:3458:N:0:CTAGATTGCG+CGCCATATCT#0 107 UniRef100_F0XQ24, C A00201R:540:HTWLMDSX2:4:1101:4562:3724:N:0:CTAGATTGCG+CGCCATATCT#0 91 UniRef100_A0A6U4CSK6,UniRef100_A0A6U4PJC4, C A00201R:540:HTWLMDSX2:4:1101:10818:3881:N:0:CTAGATTGCG+CGCCATATCT#0 92 UniRef100_A0A6D2I8D3, C A00201R:540:HTWLMDSX2:4:1101:19397:3865:N:0:CTAGATTGCG+CGCCATATCT#0 119 UniRef100_A0A1A9X6J1,UniRef100_A0A1B0C157, C A00201R:540:HTWLMDSX2:4:1101:11017:3787:N:0:CTAGATTGCG+CGCCATATCG#0 77 UniRef100_A0A7J6YRQ6, C A00201R:540:HTWLMDSX2:4:1101:7184:4351:N:0:CTAGATTGCG+CGCCATATCT#0 99 UniRef100_A0A1W0A4D9, C A00201R:540:HTWLMDSX2:4:1101:10954:4460:N:0:CTAGATTGCG+CGCCATATCT#0 75 UniRef100_A0A425CKJ8,UniRef100_A0A3M6VAV2, C A00201R:540:HTWLMDSX2:4:1101:17219:3035:N:0:CTAGATTGCG+CGCCATATCT#0 75 UniRef100_A0A0R3RTC3, C A00201R:540:HTWLMDSX2:4:1101:9706:6214:N:0:CTAGATTGCG+CGCCATATCT#0 67 UniRef100_A0A2N1JHI0, C A00201R:540:HTWLMDSX2:4:1101:28275:6245:N:0:CTAGATTGCG+CGCCATATCT#0 73 UniRef100_A0A420I9F8, C A00201R:540:HTWLMDSX2:4:1101:5674:6308:N:0:CTAGATTGCG+CGCCATATCT#0 80 UniRef100_A0A553NNA0, C A00201R:540:HTWLMDSX2:4:1101:16396:7153:N:0:CTAGATTGCG+CGCCATATCT#0 77 UniRef100_A0A7E4V1Z9, C A00201R:540:HTWLMDSX2:4:1101:15790:5822:N:0:CTAGATTGCG+CGCCATATCT#0 70 UniRef100_A0A4Y9XY08, C A00201R:540:HTWLMDSX2:4:1101:6379:4711:N:0:CTAGATTGCG+CGCCATATCT#0 67 UniRef100_A0A7H9B3Z8, C A00201R:540:HTWLMDSX2:4:1101:18421:7247:N:0:CTAGATTGCG+CGCCATATCT#0 70 UniRef100_D2UZR0, C A00201R:540:HTWLMDSX2:4:1101:25491:7185:N:0:CTAGATTGCG+CGCCATATCT#0 76 UniRef100_A0A7J7MRD4, C A00201R:540:HTWLMDSX2:4:1101:3766:7952:N:0:CTAGATTGCG+CGCCATATCT#0 130 UniRef100_A0A2P6MPD3, C A00201R:540:HTWLMDSX2:4:1101:6677:7513:N:0:CTAGATTGCG+CGCCATATCT#0 66 UniRef100_UPI00046B86F1,UniRef100_UPI00101A6817,UniRef100_A0A7J7RN07, UniRef100_UPI00174EE613,UniRef100_UPI001879286F,UniRef100_UPI00187BE54D,UniRef100_A0A6J2LGH6,UniRef100_A0A091DAR8,UniRef100_G3QQ06,UniRef100_H2QIP3,UniRef100 _A0A2R8ZMY9,UniRef100_A0A2I2Z2S3,UniRef100_A0A2I3TTX4,UniRef100_A0A2R8ZL60,UniRef100_I3L6Y7,UniRef100_UPI00174F6B19,UniRef100_A0A4X1VM36,UniRef100_A0A1S3G8E0 ,UniRef100_A0A250YM71,UniRef100_UPI00098197E9,UniRef100_A0A480ZGB8, C A00201R:540:HTWLMDSX2:4:1101:2799:7623:N:0:CTAGATTGCG+CGCCATATCT#0 75 UniRef100_A0A1L9S8C0,UniRef100_A0A5N6GE77,UniRef100_A0A5N6ILS2,UniRef 100_A0A2J5HY18,UniRef100_A0A2I2FFA2,UniRef100_A0A2I1DCZ6,UniRef100_A0A0L1J1B3,UniRef100_A0A5N7DCG3,UniRef100_A0A5N6W4C4,UniRef100_A0A5N6YEP6,UniRef100_A0A5N6 EHA5,UniRef100_A0A5N6WKJ1,UniRef100_A0A5N6DQK4,UniRef100_A0A5N6T1F6,UniRef100_A0A5N6ZMC6,UniRef100_A0A5N7E2C6,UniRef100_A0A5N7AT84,UniRef100_A0A5N6V689, C A00201R:540:HTWLMDSX2:4:1101:19397:7842:N:0:CTAGATTGCG+CGCCATATCT#0 73 UniRef100_A0A3M7L515,UniRef100_A0A087SHP5,UniRef100_A0A1D1ZN97, C A00201R:540:HTWLMDSX2:4:1101:24912:8656:N:0:CTAGATTGCG+CGCCATATCT#0 176 UniRef100_UPI0005CE24EE,
Thanks! I see that the fastq file headers end with #0/1, but in the kaiju file they end with #0. Kaiju must be performing some unwanted parsing behavior.
Do you have any suggestions to solve this issue?
Yes, I've introduced a stopgap measure to deal with it. Clone the repository again and let me know if you run into any errors. In the meantime I've reached out to the Kaiju folks and am waiting for their reply. Because they are cutting off the tail end of the fastq header they are losing whether it is the forward or reverse read that mapped.
Hi, I have a similar issue (No reads mapped to the marker genes with Kaiju. Analysis ended!) and it seems the stopgap measure is not solving it. Here you can find my fastq file and my kaiju output:
head Sample1_1.fastq
@V350086151L2C001R0010000001/1
CACACACCAAAAGAGCTGTGAGTCCGTGAGTCCCCAATCGCGAAGCACAATCGTTTTGCCGTTCGCGACATCAACAATCGCCCTCGGTTGCTTCCGATACACGTACCTGATACTTAAAGGAGGTCATTGTAATAGTTAAGTGCGGTAAAA
+
eedeeebbfffedEcdTcPbF_]W^e`de_Ge[d^dddecScd[b`ec`SddcfLeJeeefddfFfeeffeeedce`eeXe_cee`aWf\ZRd[cYEbeeee^dMRefbYdD^LQfeeJbcWde\cedeebDe`UdF[eeceTd]deeec
@V350086151L2C001R0010000002/1
GGTCAGCTTGAGTTCGACCTTGCCGCCCAGCGCCTCGAGGTTCTTCCCCTGCTCCTCGCGCGCGGCGCGCAGCGCGGCCGCCTGCTCCTCGCGGGCGATGCGCAGCGCCTCATCAATCCGCTGGTGCAGCGAGGCCATCTGCTCGGCGAG
+
ecfdefefeeefefee[edeeeefeeeffffeedbeebfecfeedeefeefeecefdddfeeeeedefeeeeceffeeeedfeedebffe__aedefddfedfcefeeefeeeedfeefeeffedfeeeeeIffefdfeeefeff`feVe
@V350086151L2C001R0010000077/1
GATCGCCCACAACGTCGTCATTGGCGACCATTGCCTCGTCGTCGCCCAGGTGGGCATCGCAGGCAGCACCCGGCTGGGGAATTACGTGGCCCTGGGTGGACAGGTCGGCCTGGCCGGCCACCTGAAGATCGGCAACCAGGTCACCGTCGC
head Sample1_2.fastq
@V350086151L2C001R0010000001/2
TCCACCTCACTCCCACCGAAGCACCGAAGAACGTGAACCCGCCTGCATCCATCAAAACTTGTGTCGTTGCCGTCATTTTTTAAAGCTTGTTGCCCTTGCTTTTACAGCACTTAACTAATACAATGACCTCCTTTAAGGATCAGGTACCAG
+
eeeeVIeeeEEPZWHbee_eOce[LNKecWeGcZeCc\DeZVedZcdee_VededddLPfYdedde`edefdedeWeVd`dMeebeeeSdHYIecaee^eedeeaLGeccRbddeeebFede]decCefZe_edPcHFbdW\bVIeLFQd
@V350086151L2C001R0010000002/2
CCCGCGTCGAGGCCGGGATCGCCGAGGGGCGGGAGGCGACGCTGCGCAAGCAGAGCGAGGCGCTCGCCGAGCTGATGGCCTCGCTGTACCAGCGGTTTGATGAGGCGCTGCGCATCGCCCGCGTGGCGCAGGCGGACGCGCAGCGCGCCG
+
eedeec^dcMfZddXSee_efeecUebadYbdee`b^e_eeb_efe\Lbf[ScFdef_eZbVOe_\d[cRfdJeeef_]QeddEdeQeeeefOe\Lec[aed]ee[]R`cC`cS^LaedPSRcEcUHec\e`ec`TeV_e]KceeMEfdH
@V350086151L2C001R0010000077/2
CTGCTGGAGGGCCAGGATCTGACGTTTCATTTTCCGGTCGGGCCGCGCGCGTGAGCCCAGCCACTTTTCGCCATCGGGGATGTCACGCATGTCCCCCGTCTGGGCGGCGGCGGTGACCTGGATGCCGATCTTCAGGTGGCCGGCGAGGCC
head kaiju
C V350086151L2C001R0010080933 87 UniRef100_A0A4S4EAY3,UniRef100_UPI000CE19BB6,
C V350086151L2C001R0010342833 118 UniRef100_A0A421FSM4,UniRef100_A0A3R7JQ58,UniRef100_A0A3F2RCQ2,
C V350086151L2C001R0010427682 144 UniRef100_A0A6T9N8P3,
C V350086151L2C001R0010483629 249 UniRef100_A0A6T7JN01,
C V350086151L2C001R0010521392 116 UniRef100_B7FSJ6,UniRef100_A0A1E7FB78,
C V350086151L2C001R0010686242 126 UniRef100_A0A1V2LT19,UniRef100_A0A099P2Z5,UniRef100_A0A507ELQ1,
C V350086151L2C001R0010929114 155 UniRef100_A0A075B065,
I have an error while running it for my shotgun metagenome reads. I have all my input reads and database in home folder and i have set a output folder in my scratch.
CPU threads: 8
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: /scratch/class01/taxaTarget/10EE_S60.results
Target sequences to report alignments for: 25
Opening the database... [5.366s] Database: /home/class01/taxaTarget/data//marker_geneDB.fasta.dmnd (type: Diamond database, sequences: 877724, letters: 500196405) Block size = 2000000000 Opening the input file... [0.297s] Error: Error detecting input file format. First line seems to be blank. Beginning analysis. Mapping reads with Kaiju. Extracting reads mapped by Kaiju. Aligning reads with Diamond. Traceback (most recent call last): File "run_protist_pipeline_fda.py", line 101, in
if os.path.getsize(out+'/kaiju.fasta.diamond') == 0: sys.exit("No reads mapped to the marker genes with Diamond. Analysis ended!")
File "/opt/asn/apps/anaconda_3-2020.11/lib/python3.8/genericpath.py", line 50, in getsize
return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/class01/taxaTarget/10EE_S60.results/kaiju.fasta.diamond'
Here is the header of my input file
@A00201R:540:HTWLMDSX2:4:1101:8540:1501:N:0:CTAGATTGCG+CGCCATATCT#0/1 GAGTTCTGCCATGACCACCGCCAGCGGCCTGCCGCCGTCGATCACCGTTTCCTCCCTCGACCTCGCGCGCCTGGAGGCGTTGCTGGATACCCCCGCC + FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF @A00201R:540:HTWLMDSX2:4:1101:9986:1501:N:0:CTAGATTGCG+CGCCATATCT#0/1 CGGGCGCCACTTCGCGCGCGTCCAGCATCACGTCCTTGCCGGTCCTGGCGTCGTGCAGCAGGAAGAAGCCGCCACCGCCCAGG + FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FF @A00201R:540:HTWLMDSX2:4:1101:10095:1501:N:0:CTAGATTGCG+CGCCATATCT#0/1 GCATCATGCTGCCCAACCATTCGCCGCTCGTCATTGCCGAACAGTTCGGCACGCTGGCAGCCCTTTTGCCAGGCCGTGTTGACCTTGGCCTTGG
In my result folder i have three output files, Kaiju, empty kaiju.fasta and read_file_info.txt.
Thank you.