DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
683 stars 266 forks source link

Kraken2 does not add to the custom library *every* downloaded sequence #844

Closed LeandroD94 closed 6 days ago

LeandroD94 commented 6 days ago

Hi!

I'm trying to build a custom database using also the 18S and 28S sequences (downloaded from SILVA) to better classify the protozoa (which are under-represented in RefSeq). I was able to build and use the custom database without errors, however, from a deeper inspection look like I'm NOT succeeding in adding ONLY CERTAIN sequences as the following one (written below).

What is the issue which is causing this "silent inconvenience"?

Thank you, Leandro

seq|kraken:taxid|77619 Vorticella_microstoma ACAUGGAUAACCGUGACAAAUUACAGCUAAUACAUGCAGUCAGACCUGGUCCAAGGGUCGUAAUUAUUAGUAUUAAACCAUUUCCGAAAGGAGUGUGAUGAAUCAUAAUAAUCGAACGAAUCGCCUGGUGUGCGAUAAAUCAUUCAAGUUUCUGCCCUAUCAGCUUUGGAUGGUAGUGUAUUGGACUACCAUGGCAGUCACGGGUAACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCUACGGAAGGCAGCAGGAGCGAAAAUUGCCCAAUCCCGACACGGGGAGGCAGUGACGAGAAAUAACAACUCUUGGUUAUUCUAAGAAGUGUAAUGAGGAUAAUUUAAAACCCUUACCGAAAGCAAUUGGAGGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGAAAUUCUGGCUGAUCGAUCCUCGAGCUCUGUAGCCGAGGACUCGUCAGUCAUCCGCUUGCAAAUAUAUGUUCGCCUUUAACCGGGUGGCUAUAUGAGUAAGCAAUUUACCUUGAGAAAAACAGAGUGUUCCAGGCAGGUUUGUCCGGAAUGCAUUAGCAUGGAAUAAUAGAAUAUGACUGAAGUCGAUUUAUUGGUUUGAGGCUUUAGUAAUGAUUAAUAGGAACAGUCGGGGGCAUUGGUACUUGUCAGUCAGAGGUGAAAUUCUAGGAUUUGACAAAGACUAACAAAUGCGAAAGCAUUUGCCAAGGAUGUUUUCAUUAAUCAAGAACGAAAGUUAGGGGAUCAAAGACGAUCAGAUACCGUCCUAGUCUUAACUAUAAACUAUACCGACUCGGAUUCAGAUGAAUCAUAAAGUUCAUUUGGGACCGUAGGAGAAAUCAAAGUUUUUGGGUUCUGGGGGAAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGUUUUGCACCACCAUGGAGUGGAGUCUGCGGCUUAAUUUGACUCAACACUGGGAAACUCAUCAGGGCAAGAAGAUUGUAGGAUUGACAGAUUGAGAGUUCUUUCUUGAUUGGUCUAGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCCGUUAACGAACGAGACCUUAACCUGCUAACUAGUACACCGAUGACAAAUCGGCGUUACUUCUUAGAGGGACUAUGUGAUGUAAUCACAUGGAAGUUUGAGGCAAUAACAGGUCUGUGAUGCCCUUAGAUGUCCUGAGCUGCACGCGUACUACAAUGGUGCUUUCAACGAGCUUUUCCUGAUCCGAAAGGAUUUGGGUAAUCUUUUUAGUGAGCACCGUGCUUGGGAUUGAUCUUUGUAAUUAUGGAUCAUGAACUAGGAAUUCCUAGUAAGCACGGGUCAUCAGCCCGUGCUGAUUACGUCCCUGCAAAAUGUACACACCGCCCGUCGCUAUUACCGAUUGAGUGUAAAGGUGAACCUUCUCGAUAGUGUCACCGCUAGAAAUUAAGUAAACCUUGCACU

jenniferlu717 commented 6 days ago

Where are you getting these sequences and what command lines are you using to add/build these? I suspect it has to do with the fact that these are RNA sequences and not DNA

LeandroD94 commented 6 days ago

Hi jennifer!

I get the sequences from the SILVA database (then rRNA) and formatted the headers (with NCBI ID) as requested by kraken2... And you are right, looks like just changing the U in T before adding the sequence is enough to include also these missing sequences in the database!

Thank you, Leandro

jenniferlu717 commented 6 days ago

Oh i would be careful about translating this. You need to convert RNA to DNA, so not just U --> T but also C --> G etc

LeandroD94 commented 6 days ago

If I convert the C in G (then the complementary strand), shouldn't I convert the U in A then? At this regard, then do you suggest to use the complementary sequences of those rRNA sequences downloaded by SILVA?

Leandro

jenniferlu717 commented 6 days ago

I would use the complement yes. So yes U to A, C to G, G to C and A to T

Johns Hopkins Hospital Department of Pathology, Microbiology Staff Scientist/Bioinformatician, Simner Lab 240-449-7437


From: Leandro Di Gloria @.> Sent: Tuesday, June 25, 2024 1:58:10 PM To: DerrickWood/kraken2 @.> Cc: Jen Lu @.>; Comment @.> Subject: Re: [DerrickWood/kraken2] Kraken2 does not add to the custom library every downloaded sequence (Issue #844)

  External Email - Use Caution

If I convert the C in G (then the complementary strand), shouldn't I convert the U in A then? At this regard, then do you suggest to use the complementary sequences of those rRNA sequences downloaded by SILVA?

Leandro

— Reply to this email directly, view it on GitHubhttps://github.com/DerrickWood/kraken2/issues/844#issuecomment-2188295297, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA42WW5KGFB5DW7Y6CGZQCDZJESZVAVCNFSM6AAAAABJ2VGV5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBYGI4TKMRZG4. You are receiving this because you commented.Message ID: @.***>

LeandroD94 commented 6 days ago

Then I will translate them, thank you for your time!

Leandro