PICRUSt with de novo variants related databases

AbhishakeL commented 6 years ago

Thank you for the "PICRUSt Tutorial with de novo Variants" tutorial. Can you please explain a little how the files in the "img_gg_starting_files" folder were built"?

gavinmdouglas commented 6 years ago

I don't have information on how all of these files were built (all of them except the constraint alignment were used in the original PICRUSt paper), but I can explain what they contain:

gg_13_5_img_16S_counts.txt - counts of 16S copies per reference genome (where the reference genomes are only those that overlap with Greengenes)
gg_13_5_img_fixed.txt - mapfile of Greengenes ids to IMG genome ids.
gg_13_5_img_subset.fasta - 99 OTUs from Greengenes - only those overlapping with IMG.
img_400_ko.tab - KEGG ortholog abundances in IMG genomes (note that not all of these overlap with IMG)
99_otus_IMG_pruned_no_names_constraint.txt - constraint alignment of Greengenes OTUs overlapping with IMG genomes, which can be used with FastTree to keep a certain core topology. This was made by following the instructions here: http://meta.microbesonline.org/fasttree/constrained.html

Hopefully that helps!

AbhishakeL commented 6 years ago

Thanks a lot Gavin. Actually, I have got access to the latest KEGG database and thus thought of updating the database before running PICRUSt. I have got some hint at this thread https://groups.google.com/forum/?hl=en#!starred/picrust-users/0y7RSOMsm1o but I am still trying ti figure out the other files. Is there any SOP you know about?

gavinmdouglas commented 6 years ago

That link isn't working for me unfortunately. I don't know of any SOP to make those files, sorry! I am working on a beta version of PICRUSt2 which has updated genomes and functions from IMG just so you know. PICRUSt2 is available here and is still being actively developed: https://github.com/picrust/picrust2

LangilleLab / microbiome_helper

PICRUSt with de novo variants related databases #34