Open cmungall opened 3 years ago
Can this be ready to implement in MIxS 6 by May ?
Sent from my iPhone
On Mar 10, 2021, at 2:17 PM, Chris Mungall notifications@github.com wrote:
This is something we are doing in NMDC that we want to push up to the standards level
Submitters often find it difficult to select the correct ENVO terms. This is compounded by the lack of suitable ontology browsing tools and the prevalence of spreadsheet-based data submission vs dedicated tools with intelligent context-aware support for term selection that we see in other areas of biocuration. This is also made difficult by ENVO's move away from a system whereby each term came from one of three hierarchies. Things are more open-ended now, which leads to more submitter/annotator confusion. This is in evidence from the extremely poor quality of ENVO annotations in INSDC.
As a partial solution we should have recommended slims for each package/field combination. Submitters/annotators can still select terms outside these fields but these would serve as the starting point. Even if submitters restrict themselves to the selected fields then I hypothesize the gain in accuracy would vastly overcome loss in precision.
I suggest a 3 column format
package field (env_X) valid ENVO term An entry in this table means that the ENVO term is valid for the package/field combination
We could also have:
package field (env_X) valid ENVO term ENVO local name If we want to rename some of the more abstract ENVO labels in a local context
(this format also cleanly maps to the LinkML YAML format, which is how I envision us maintaining this moving forward)
This can also be easily implemented via dropdowns in spreadsheets
We in NMDC can get us started with a selection for soil
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
I love the idea and agree it would probably make a huge difference to the ease of use, however, I think it's a massive undertaking to generate all the required slims and have them vetted by relevant user-groups for every environmental package before May. We can make a start as soon as anyone has the bandwidth to do so, but I think it's unrealistic to have it ready for public consumption by May. We should schedule its release for the MIxS v7 instead. Can the suggested slims be treated in the same way as our controlled vocabulary fields, or to put it another way, can our other controlled vocabulary fields use the same technology as these slims? After all, a CV is just a slim of the English language!
This makes sense, connects to some of our subsets in ENVO.
I agree with @only1chunts that this is more likely to be a MIxS 7 target. However, I think we should release a general suggestion for further revision, rather than wait for full consensus.
Sounds good.
Sent from my iPhone
On Mar 16, 2021, at 12:09 PM, Pier Luigi Buttigieg @.***> wrote:
This makes sense, connects to some of our subsets in ENVO.
I agree with @only1chunts that this is more likely to be a MIxS 7 target.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
This is something we are doing in NMDC that we want to push up to the standards level
Submitters often find it difficult to select the correct ENVO terms. This is compounded by the lack of suitable ontology browsing tools and the prevalence of spreadsheet-based data submission vs dedicated tools with intelligent context-aware support for term selection that we see in other areas of biocuration. This is also made difficult by ENVO's move away from a system whereby each term came from one of three hierarchies. Things are more open-ended now, which leads to more submitter/annotator confusion. This is in evidence from the extremely poor quality of ENVO annotations in INSDC.
As a partial solution we should have recommended slims for each package/field combination. Submitters/annotators can still select terms outside these fields but these would serve as the starting point. Even if submitters restrict themselves to the selected fields then I hypothesize the gain in accuracy would vastly overcome loss in precision.
I suggest a 3 column format
An entry in this table means that the ENVO term is valid for the package/field combination
We could also have:
If we want to rename some of the more abstract ENVO labels in a local context
(this format also cleanly maps to the LinkML YAML format, which is how I envision us maintaining this moving forward)
This can also be easily implemented via dropdowns in spreadsheets
We in NMDC can get us started with a selection for soil
Note that as tooling becomes more sophisticated we can have less primitive ways of guiding users to the right terms but we have to start with something that works within the current tooling ecosystem