EnvironmentOntology / envo

A community-driven ontology for the representation of environments
http://www.environmentontology.org
Creative Commons Zero v1.0 Universal
132 stars 51 forks source link

Request for the individual list of ENVO terms by each of the triad #960

Closed jagadishcs closed 4 years ago

jagadishcs commented 4 years ago

@pbuttigieg @cmungall

Request for the individual list of ENVO terms that belong to each one of the ENVO triad, Biome, Environmental feature and Environmental material.

pbuttigieg commented 4 years ago

@jagadishcs I'm not sure what you're asking for here - CSVs of all the terms under biome and material (that doesn't really work for feature which isn't a single hierarchy and is being obsoleted)?

Is browsing the hierarchy over at the OLS not a solution? image

Also note, that MIxS v5 does not use the biome/feature/material triad. See this page.

jagadishcs commented 4 years ago

@pbuttigieg @tbkreddy @cmungall Having ENVO triad terms as a list will be helpful while implementing in the GOLD that would subsequently enable efficient way to update the terms in the GOLD based on the user/PI input or while curation.

TBKReddy commented 4 years ago

Thanks Pier for looking into this.  Thanks Jagadish for filing this ticket and looping me in. See my comments below for Pier's questions.

Subject: Re: [EnvironmentOntology/envo] Request for the individual list of ENVO terms by each of the triad (#960) To: EnvironmentOntology/envo envo@noreply.github.com Cc: jagadishcs <>, Mention mention@noreply.github.com

@jagadishcs I'm not sure what you're asking for here - CSVs of all the terms under biome and material (that doesn't really work for feature)?  

Pier is not sure what was actually requested in the ticket. Let me give a try. We/GOLD working with Chris on curating a large set of samples with ENVO triad terms. To facilitate the curation, scaleup, have some QC checks too see if the assigned/available terms fall into one of the three triads etc., we are looking for a list of Biome, Feature, Material terms/ids.  From the 9K ENVO terms we see, what are the Biome terms, what are the Feature terms and what are the Material terms? That is our question and that is what we are looking for.  It is not clear to me what you meant by  "...material (that doesn't really work for feature)? ". Does it mean there is no way of knowing what ENVO terms are fall under Features? 

Is browsing the hierarchy over at the OLS not a solution?

No it doesn't serve our needs. Browsing on the web or using a Protege like tool is good to explore one at a time and take info from it to curate a handful of samples. But what we are doing is a very large scale hands on curation of thousands of samples and using the ENVO's triad concept, i.e. to curate each sample with Biome, Feature and Material terms. After initial curation we will be updating/revising our curation as needed over the time. Again to facilitate this it is helpful to know the different terms under each of these three ENVO triads. Towards this end, it is not easy to find/get a list of Biome terms etc., that is why this request.  

@cmungall who is involved in our conversations prior to filing this ticket may clarify if needed on what we are looking for, why we need it and how to find etc. The following are Chris comments on this topic in our email conversation and it should help in putting things in perspective.

I think this is a great request to put on the ENVO tracker - this should be a dynamic list that is updated with each release. Other members of the OBO community will find this useful. If someone makes the ticket I'll make sure it gets done.

@cmungall, feel free to chime in and help Pier on this.

Thanks,

Reddy.

cmungall commented 4 years ago

@jagadishcs here are 3 preview URLs with TSVs for each file:

Click "raw" if you want to download the raw tsv, e.g for import into excel

please don't bookmark these URLs yet, this is just for preview. We may change the names. After approval there will be PURLs for each file, with archived versions for each release, and a PURL for the latest files

Example:

ID label subClassOf [ID] subClassOf [LABEL] SYNONYMS inSubset definition
ENVO:00000428 biome EcosytemType, major habitat type A biome is an ecosystem to which resident ecological communities have evolved adaptations.
ENVO:00000446 terrestrial biome ENVO:00000428 biome terrestrial realm A biome which is primarily or completely situated on a landmass.
ENVO:00000447 marine biome ENVO:00002030 aquatic biome marine realm envoPolar An aquatic biome which is determined by a marine water body.
ENVO:00000873 freshwater biome ENVO:00002030 aquatic biome freshwater realm envoPolar An aquatic biome which is determined by a body of freshwater.

The subClassOf columns give you hierarchy. I included the definitions as this is important for annotation

TBKReddy commented 4 years ago

Thank you @cmungall. @jagadishcs download these 3 lists to confirm this is what we will be looking for.

Thank you all. Reddy.

pbuttigieg commented 4 years ago

@TBKReddy @jagadishcs @cmungall

Points of clarification:

@cmungall

feature (actually the "astronomical body part" hierarchy)

env_local_scale slots don't have to be filled with classes from the astronomical body part hierarchy; manufactured objects and other terms are valid.

@TBKReddy

It is not clear to me what you meant by "...material (that doesn't really work for feature)? ". Does it mean there is no way of knowing what ENVO terms are fall under Features?

As described above, the "feature" slot doesn't exist in MIxS v5 - that's a legacy field.

jagadishcs commented 4 years ago

@pbuttigieg @cmungall @TBKReddy

Hi Pier and Chris,

Thanks for the clarification and sharing the triad data. Among the term labels that are present in the triad (Biome, Environmental feature and Environmental material), a total of 58 terms (9 - Biome, 14 - Environmental feature and 35 - Environmental material) are not present in the ENVO terms downloaded from the Ontobee. However, they have parent terms or their parent’s parent terms that are present in the ENVO download. So, I am assuming that these are new term labels added into the triad, and would eventually make it to the ENVO download. Just wanted to confirm with you that my understanding is correct.

Further, out of the 9113 ENVO term labels, only 1358 are coming under any one of the triad. Would it be possible to assign as many term labels as possible from the remaining 7755 to any one of the triad. This would be particularly useful since MIxS environmental packages have the ENVO triad as mandatory descriptors.

Best, Jagadish

jagadishcs commented 4 years ago

@pbuttigieg @cmungall @TBKReddy @jagadishcs

List of ENVO terms from the individual triad list that are not available in the Ontobee:

ID | label ENVO:01001830 | tropical biome ENVO:01001831 | temperate biome ENVO:01001832 | subtropical biome ENVO:01001833 | mediterranean biome ENVO:01001834 | subpolar biome ENVO:01001835 | alpine biome ENVO:01001836 | montane biome ENVO:01001837 | subalpine biome ENVO:01001838 | arid biome ENVO:01001607 | water current ENVO:01001645 | gaseous part of an atmosphere ENVO:01001653 | well-mixed estuary ENVO:01001665 | soil cryoturbate ENVO:01001773 | thrust fault ENVO:01001775 | blind thrust fault ENVO:01001781 | part of a landmass ENVO:01001784 | compound astronomical body part ENVO:01001786 | inland sea ENVO:01001851 | geothermally active field ENVO:01001854 | hydrothermal field ENVO:03000132 | firebreak ENVO:03500001 | playground ENVO:03500002 | public park ENVO:01001608 | kerosene oil ENVO:01001614 | ice-bearing permafrost ENVO:01001616 | bare soil ENVO:01001638 | frost-susceptible soil ENVO:01001644 | material primarily composed of biogenic carbonates ENVO:01001646 | amorphous solid ENVO:01001647 | colloid suspended in a hydrosphere ENVO:01001648 | suspended colloidal sediment ENVO:01001649 | heavy fraction material ENVO:01001650 | light fraction material ENVO:01001651 | particulate matter in a hydrosphere ENVO:01001652 | atmospheric aerosol ENVO:01001820 | cultured organic material ENVO:01001821 | hydrothermally-influenced sediment ENVO:01001845 | hoar ENVO:01001846 | depth hoar ENVO:01001847 | hoarfrost ENVO:01001848 | rime ENVO:01001849 | glaze ENVO:01001850 | frost ENVO:02000090 | ash ENVO:02000126 | wood ash ENVO:02000127 | coal ash ENVO:02000128 | fly ash ENVO:02000129 | bottom ash ENVO:02000130 | boiler slag ENVO:02000131 | flue gas desulfurization material ENVO:02000140 | fluid environmental material ENVO:03500005 | anthropogenic litter ENVO:04000007 | lake water ENVO:04000008 | soil organic matter ENVO:04000012 | particulate organic matter ENVO:04000013 | particulate organic carbon ENVO:04000014 | particulate organic nitrogen ENVO:01001841 | volcanic soil

cmungall commented 4 years ago

@jagadishcs - yes the links I have you were for a development branch in envo. Once the new release is out I'll give you PURLs

@pbuttigieg - it would be good to have subsets more geared towards the mixs5 triads applied to standard microbiome use cases. From what you say the strict hierarchy subsets are too limiting. But all of ENVO is too inclusive. We should iterate on the middle ground here. It's possible we could even do this per package. I'd like to put the work in here just haven't had time..