Metagenomic clustering, functional potential, and enrichment

I have a bunch of 16S rRNA sequencing data from microbial communities across different root zones in rice and across different rice fields. This variation provided by this experiment should allow me form clusters of microbes that are potentially interacting. Understanding why these microbes are clustering is difficult though.

Thankfully, someone else has formed a database of microbes and given these microbes different KEGG categories based on their taxonomies and how well these traits are phylogenetically linked to these taxa. http://picrust.github.io/picrust/

Basically, I want to make a pipeline to 1) Cluster the 16S count data into clusters 2) Count up KEGG categories for the taxa in each cluster 3) Look for enrichment of KEGG categories in each cluster

I want to do all this in python.

TheCodingCollective / Welcome

Metagenomic clustering, functional potential, and enrichment #15