dib-lab / 2022-sra-gather

Classify all the metagenomes. ALL THE METAGENOMES. (Eventually.)
Other
0 stars 1 forks source link

potential metadata resources to improve data labelling #9

Open taylorreiter opened 2 years ago

taylorreiter commented 2 years ago

Curated

Learned

Metadata Retrieval

taylorreiter commented 2 years ago

looks like consortia-specific metadata might still be more info rich

e.g. I just went to HumanMetagenomeDB and downloaded the data. I filtered to samples labelled as haven't crohn's disease. The information was incomplete, e.g samples aren't labelled as having come from the same individual, and antibiotic information is not natively included [not that it even is in the ihmp metadata sheet...I had to go look at other samples like serology that were taken at the same time as the metagenome samples to figure out what antibiotic the patient was on when they have an mgx sample]

BUT, it looks like disease, study, sample type, etc. have a good amount of metadata. at the very least, i think any SRA id that is in HumanMetagenomeDB is probably from human, so might be a good and easy cross check