gigascience / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
9 stars 14 forks source link

unify RVT classifications with GigaDB Dataset Types E678 #288

Open only1chunts opened 5 years ago

only1chunts commented 5 years ago

probably an Epic

User story

As a handling curator I want the manuscript classifications from RVT/GigaByte to be added to the GigaDB dataset So that I dont have to manually change them after the dataset is uploaded from RVT

Acceptance criteria

Given a manuscript is present in RVT When the handling editor passes it to curation for "get data" step Then the same terms are selected in the Dataset Types as those given in the RVT classifications

Given conditions
When event
Then outcome

Additional Info

The new list of dataset types is available here on the tabs called "types-grid" and "types-list". It should be noted that the types are in a 2 level hierarchy which will require the addition of an extra column in the type table to store that information, see ticket #411

The list of updates required to the current dataset types in the database is also available here on the tab called "map-current-dataset-types".

Product Backlog Item Ready Checklist

Product Backlog Item Done Checklist

This Story is part of Epic #678

only1chunts commented 5 years ago

The dataset type list is current a short controlled vocabulary made up by us based on general topics of the datasets we host. To enable a more expansive but still controlled set of terms and to allow us to integrate with others, we should use the Topic section of the EDAM ontology. http://edamontology.org/topic_0003

or to view it in OLS: https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Ftopic_0003 An alternative to EDAM might be The Subject Resource Application Ontology (SRAO) https://github.com/FAIRsharing/subject-ontology. This is a combination of EDAM with intergration/mapping to other ontologies. And the Domain Resource Application Ontology (DRAO) https://github.com/FAIRsharing/domain-ontology , which is more granular than SRAO.

only1chunts commented 4 years ago

have created a slim of the SRAO and grouped into 13 primary classifications, see spreadsheet https://drive.google.com/file/d/1shH-QK-UT9HD88ysWFHVq5xUenxAmcbg/view?usp=sharing This is likely to be the starting set of terms for the primary and secondary claffications in RV submission tool. We will need to decide which terms are included in GigaDB initially, do the mappings for any deprecated terms and potentially update all datasets with appropriate new terms?

ScottBGI commented 2 years ago

Not sure if this is helpful for how we design these, and it could be a useful mapping exercise project for an intern, but I've just seen Dryad use domains drawn from the OECD Fields of Science and Technology classification: https://www.oecd.org/science/inno/38235147.pdf

only1chunts commented 2 years ago

I dont believe using the OECD classifications is the way for us to go as its a very high-level classification, but I have opened a ticket in the SRAO project to ask them to look into mapping their terms to it! FYI- the OECD have a newer version (than the one Scott pointed to) of the controlled vocabulary of terms relating to the fields of R&D (FORD) classifications in the main Frascati manual (see table 2.2 in this document)