INCATools / biosample-analysis

analysis of biosamples in INSDC
3 stars 1 forks source link

Subset columns to MIxS terms (version 5) #19

Closed wdduncan closed 3 years ago

wdduncan commented 3 years ago

Create a version of biosample data whose columns are MIxS 5 terms. Not all the harmonized names (e.g., 'fire') MIxS terms.

cc @cmungall @realmarcin

cmungall commented 3 years ago

fire is in mixs

cmungall commented 3 years ago

I suggest before starting this, catalog the list of fields not in mixs, we may need to make a sssom mapping

wdduncan commented 3 years ago

I going to subset using the non-human environmental package terms.

wdduncan commented 3 years ago

See notebook build-non-human-samples.ipynb: https://github.com/INCATools/biosample-analysis/blob/master/src/notebooks/build-non-human-samples.ipynb

This notebook is subset to env_packages containing the strings:

Output has been saved to target/non-human-samples.tsv.gz.

Potential enhancements to target/non-human-samples.tsv.gz:

cc @realmarcin