dib-lab / 2022-sra-gather

Classify all the metagenomes. ALL THE METAGENOMES. (Eventually.)
Other
0 stars 1 forks source link

analysis idea: re-cluster the SRA based on taxonomy profiles or on jaccard similarity to de novo produce "biomes" that inherit labels from majority in cluster #2

Open taylorreiter opened 2 years ago

taylorreiter commented 2 years ago

Along with mislabelled data, there seems to be a lot of NAs or duplicate labels -- like "seawater metagenome" vs. "marine metagenome." Are these basically the same thing? Can we infer more granular structure than the ScientificNames that are given?

ctb commented 2 years ago

yes! please just do a PR to update https://github.com/dib-lab/2022-sra-gather/blob/main/categories/mapping.csv!