AstrobioMike / bit

Bioinformatics Tools
GNU General Public License v3.0
81 stars 11 forks source link

Example bracken combining outputs and adding lineage info #10

Closed laibinhuang closed 2 years ago

laibinhuang commented 2 years ago

I have some questions about your bracken results (Thank you very much)

(1) braken just gives you one taxonomy level, how can you get all levels in one braken file

like example-bracken-output-1.tsv

(2) if I have more than 10 samples with different levels (P-phylum; C-class, O-order;...) in a folder, how can I make bracken-sample-name-map.tsv

like P21_b-output_P.tsv P21_b-output_C.tsv P21_b-output_O.tsv P21_b-output_G.tsv ... P22_b-output_P.tsv P22_b-output_C.tsv P22_b-output_O.tsv P22_b-output_G.tsv ...

P32_b-output_P.tsv P32_b-output_C.tsv P32_b-output_O.tsv P32_b-output_G.tsv ...

AstrobioMike commented 2 years ago

Hi there :)

(1) braken just gives you one taxonomy level, how can you get all levels in one braken file

While we do tell bracken to do its best to estimate at a single, specific rank, that rank is still tied to the ranks above it. So while it gives us a file that looks like this, with one rank:

name                         taxonomy_id  taxonomy_lvl  kraken_assigned_reads  added_reads  new_est_reads  fraction_total_reads
Listeria monocytogenes       1639         S             384150                 14881        399031         0.10777
Listeria grayi               1641         S             204                    0            204            0.00006
Listeria ivanovii            1638         S             116                    439          555            0.00015
Listeria welshimeri          1643         S             92                     0            92             0.00003
Listeria innocua             1642         S             51                     2            53             0.00001
Listeria seeligeri           1640         S             30                     0            30             0.00001
Listeria sp. PSOL-1          1844999      S             28                     0            28             0.00001
Listeria weihenstephanensis  1006155      S             15                     0            15             0.00000
Brochothrix thermosphacta    2756         S             4                      0            4              0.00000

And even though it tried to push all the reads down to the species level (in this case), each species has a taxon ID, and that taxon ID is linked to the full lineage above it (genus up to phylum). Hope that makes sense

(2) if I have more than 10 samples with different levels (P-phylum; C-class, O-order;...) in a folder, how can I make bracken-sample-name-map.tsv

So I don't think I would combine different ranks this way. Because bracken is using certain heuristics to try to assign reads to a specific rank, I would do all the samples the same way with bracken, and then combine all of them that were targeting the same rank by bracken. If wanting to look at what comes out when using different ranks in bracken, i would combine them separately. So all the phylum ones together, all the class ones together, and so on. But not multiple different rank levels from bracken for the same sample (nor different bracken-target-ranks from different samples). Sorry if that's confusing, it's confusing me re-reading it but i can't figure out how to word it better at the moment, ha

With regard to making the mapping table, that could be made anywhere (so long as it's a plain text file tsv at the end). Easiest way would be at the command-line if familiar with it (could run through this crash course for a good foundation if not: https://astrobiomike.github.io/unix/unix-intro). It could be done, e.g., with some ls, sed, and paste. But unfortunately i can't really help with the code of how to do that without knowing the exact layout of the working directory holding the files

laibinhuang commented 2 years ago

Thank you very much,

got you

On Sun, Dec 12, 2021 at 3:08 PM Mike Lee @.***> wrote:

Hi there :)

(1) braken just gives you one taxonomy level, how can you get all levels in one braken file

While we do tell bracken to do its best to estimate at a single, specific rank, that rank is still tied to the ranks above it. So while it gives us a file that looks like this, with one rank:

name taxonomy_id taxonomy_lvl kraken_assigned_reads added_reads new_est_reads fraction_total_reads Listeria monocytogenes 1639 S 384150 14881 399031 0.10777 Listeria grayi 1641 S 204 0 204 0.00006 Listeria ivanovii 1638 S 116 439 555 0.00015 Listeria welshimeri 1643 S 92 0 92 0.00003 Listeria innocua 1642 S 51 2 53 0.00001 Listeria seeligeri 1640 S 30 0 30 0.00001 Listeria sp. PSOL-1 1844999 S 28 0 28 0.00001 Listeria weihenstephanensis 1006155 S 15 0 15 0.00000 Brochothrix thermosphacta 2756 S 4 0 4 0.00000

And even though it tried to push all the reads down to the species level (in this case), each species has a taxon ID, and that taxon ID is linked to the full lineage above it (genus up to phylum). Hope that makes sense

(2) if I have more than 10 samples with different levels (P-phylum; C-class, O-order;...) in a folder, how can I make bracken-sample-name-map.tsv

So I don't think I would combine different ranks this way. Because bracken is using certain heuristics to try to assign reads to a specific rank, I would do all the samples the same way with bracken, and then combine all of them that were targeting the same rank by bracken. If wanting to look at what comes out when using different ranks in bracken, i would combine them separately. So all the phylum ones together, all the class ones together, and so on. But not multiple different rank levels from bracken for the same sample (nor different bracken-target-ranks from different samples). Sorry if that's confusing, it's confusing me re-reading it but i can't figure out how to word it better at the moment, ha

With regard to making the mapping table, that could be made anywhere (so long as it's a plain text file tsv at the end). Easiest way would be at the command-line if familiar with it (could run through this crash course for a good foundation if not: https://astrobiomike.github.io/unix/unix-intro). It could be done, e.g., with some ls, sed, and paste. But unfortunately i can't really help with the code of how to do that without knowing the exact layout of the working directory holding the files

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AstrobioMike/bit/issues/10#issuecomment-991988534, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU76EPDIGBJ5GMLIL33UQUTN7ANCNFSM5J2LB7KA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- ----------------------------------------------------------- Laibin Huang, Ph.D.

University of California - Davis Department of Land, Air, and Water Resources Soil EcoGenomics Lab - @SEcoGenomics Plant and Environmental Sciences Building Room 3307 Davis, CA 95616 USA Email: @.***