Closed matnguyen closed 5 years ago
You are right, edited the line to reflect 'Mycobacterium tuberculosis complex'
Since Kraken can sometimes be too general in its classification (Mycobacterium instead of Mycobacterium tuberculosis), would changing that line to accept "Mycobacterium" work better since then viable samples would not be discarded by UVP?
I still want to discriminate against Mycobacterium that is not in the Mycobacterium tuberculosis complex.
On Fri, Nov 2, 2018 at 6:54 PM Matthew Nguyen notifications@github.com wrote:
Since Kraken can sometimes be too general in its classification (Mycobacterium instead of Mycobacterium tuberculosis), would changing that line to accept "Mycobacterium" work better since then viable samples would not be discarded by UVP?
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/CPTR-ReSeqTB/UVP/issues/15#issuecomment-435532356, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLYb5rREFUBzu-yoIAiinezaCCpi3Geks5urM1DgaJpZM4YMUoY .
Actually, wouldn't the find("Mycobacterium tuberculosis") match the "Mycobacterium tuberculosis complex" in addition to all subsequent "Mycobacterium tuberculosis" containing lines? The issue (I think) may lie in the Kraken database used. I've compared Kraken results from the Galaxy server (using the bacteria database) and from a local machine (using the standard database) that show the same kind of result that matnguyen got. The Galaxy results showed MTBC cov values at > 90, while the local versions topped out at roughly 25. Should I be looking at using the Kraken bacteria database instead?
Yeah, it has to be run on bacteria database. I will update the documentation in the next major revision of the software.
On Wed, Nov 21, 2018 at 3:04 PM pvishwa2 notifications@github.com wrote:
Actually, wouldn't the find("Mycobacterium tuberculosis") match the "Mycobacterium tuberculosis complex" in addition to all subsequent "Mycobacterium tuberculosis" containing lines? The issue (I think) may lie in the Kraken database used. I've compared Kraken results from the Galaxy server (using the bacteria database) and from a local machine (using the standard database) that show the same kind of result that matnguyen got. Maybe you could add it somewhere in the dependencies that UVP needs to be run using the "bacteria" kraken database specifically?
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/CPTR-ReSeqTB/UVP/issues/15#issuecomment-440793250, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLYb4LvR477MOo76H9bRjvmnxEAuceaks5uxbHZgaJpZM4YMUoY .
When checking species specificity, samples can be discarded because Kraken may classify reads as Mycobacterium, rather than Mycobacterium tuberculosis. However, the code (below) only checks for Mycobacterium tuberculosis.
The final_report.txt for Kraken contains:
For reference, here is an accession that I have tested: