biocore / emp

Code repository of the Earth Microbiome Project.
http://www.earthmicrobiome.org
BSD 3-Clause "New" or "Revised" License
154 stars 68 forks source link

taxons with empo habitats #110

Open marctormo opened 5 years ago

marctormo commented 5 years ago

Hello!

Do you know if there is any place with a list with information of taxon and habitat? I can't find this information, and I would like to assign it to some taxons (genus or species) like this: taxon1 habitat1 taxon2 habitat2 ... If not, is there any method to extract this info?

Thank you!

cuttlefishh commented 5 years ago

Hi! The closest thing to what you're requesting is an "OTU summary" which lists for each unique tag sequence (variously called "ASVs" or "sOTUs") the samples in which it is found, along with some summary statistics. Combined with the mapping file, which lists the habitat of each sample, you can generate the file you're interested in. Of course, many sequences are found in more than one habitat. There are also different definitions of habitat, e.g., ENVO, EMPO, etc.

The OTU summary file is here (I suggest the version with chloroplast sequences filtered out): ftp://ftp.microbio.me/emp/release1/otu_distributions/otu_summary_no_chl.emp_deblur_90bp.subset_2k.rare_5000.tsv

The associated mapping file is here: ftp://ftp.microbio.me/emp/release1/mapping_files/emp_qiime_mapping_subset_2k.tsv

marctormo commented 5 years ago

Thank you! I think this is a great solution for me.