Closed LeandroD94 closed 5 months ago
Where are you getting these sequences and what command lines are you using to add/build these? I suspect it has to do with the fact that these are RNA sequences and not DNA
Hi jennifer!
I get the sequences from the SILVA database (then rRNA) and formatted the headers (with NCBI ID) as requested by kraken2... And you are right, looks like just changing the U in T before adding the sequence is enough to include also these missing sequences in the database!
Thank you, Leandro
Oh i would be careful about translating this. You need to convert RNA to DNA, so not just U --> T but also C --> G etc
If I convert the C in G (then the complementary strand), shouldn't I convert the U in A then? At this regard, then do you suggest to use the complementary sequences of those rRNA sequences downloaded by SILVA?
Leandro
I would use the complement yes. So yes U to A, C to G, G to C and A to T
Johns Hopkins Hospital Department of Pathology, Microbiology Staff Scientist/Bioinformatician, Simner Lab 240-449-7437
From: Leandro Di Gloria @.> Sent: Tuesday, June 25, 2024 1:58:10 PM To: DerrickWood/kraken2 @.> Cc: Jen Lu @.>; Comment @.> Subject: Re: [DerrickWood/kraken2] Kraken2 does not add to the custom library every downloaded sequence (Issue #844)
External Email - Use Caution
If I convert the C in G (then the complementary strand), shouldn't I convert the U in A then? At this regard, then do you suggest to use the complementary sequences of those rRNA sequences downloaded by SILVA?
Leandro
— Reply to this email directly, view it on GitHubhttps://github.com/DerrickWood/kraken2/issues/844#issuecomment-2188295297, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA42WW5KGFB5DW7Y6CGZQCDZJESZVAVCNFSM6AAAAABJ2VGV5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBYGI4TKMRZG4. You are receiving this because you commented.Message ID: @.***>
Then I will translate them, thank you for your time!
Leandro
Hi!
I'm trying to build a custom database using also the 18S and 28S sequences (downloaded from SILVA) to better classify the protozoa (which are under-represented in RefSeq). I was able to build and use the custom database without errors, however, from a deeper inspection look like I'm NOT succeeding in adding ONLY CERTAIN sequences as the following one (written below).
What is the issue which is causing this "silent inconvenience"?
Thank you, Leandro