EBISPOT / OntoString

Tool for curating mappings from free text to ontology terms
https://www.ebi.ac.uk/spot/ontostring
6 stars 1 forks source link

Placeholder for discussing support for HCA and FAANG context #95

Open henrietteharmse opened 3 years ago

henrietteharmse commented 3 years ago

This ticket serves as a discussion point for adding support for HCA and FAANG context. Here I will make some suggestions with the main intent that it is something people can point at to say it makes sense or it does not make sense.

Currently HCA and FAANG restrict mappings using graph-restrictions for some of their fields to restrict ontology terms that can be used for these fields.

Here is an example from FAANG for their experiments_chip-seq_dna-binding_proteins field:

              "graph_restriction": {
                "ontologies": ["obo:chebi"],
                "classes": ["CHEBI:15358"],
                "relations": ["rdfs:subClassOf"],
                "direct": false,
                "include_self": false
              }

Here is an example from HCA for their cell type field:

            "graph_restriction":  {
                "ontologies" : ["obo:hcao", "obo:cl"],
                "classes": ["CL:0000003"],
                "relations": ["rdfs:subClassOf"],
                "direct": false,
                "include_self": false
            },

Currently our project definition looks as follows:

{
  "name": "Project name",                     // MANDATORY
  "description": "Some description",
  "numberOfReviewsRequired": 3,
  "datasources": [
     "atlas",
     "uniprot",
     "gwas",
     ...
  ],
  "ontologies": [
     "efo",
     "mondo",
     "hp",
     "ordo"
  ],
  "preferredMappingOntologies": [ "efo" ]
}

To support HCA and FAANG, we need to add a fields field consisting of fields supporting graph-restrictions to our project definition. Here is an example for FAANG.

 "fields": [
            {
            "fieldName" : "experiments_chip-seq_dna-binding_proteins"
            "graphRestriction":  {
                "ontologies" : ["obo:hcao", "obo:cl"],
                "classes": ["CL:0000003"],
                "relations": ["rdfs:subClassOf"],
                "direct": false,
                "includeSelf": false
            }
         },
         {
          "fieldName" : "otherField" ,
          "graphRestriction":  {
          ...
            }
         }
 ] 

Here is an example for HCA:


 "fields": [
            {
            "fieldName" : "cell type"
            "graphRestriction":  {
                "ontologies" : ["obo:hcao", "obo:cl"],
                "classes": ["CL:0000003"],
                "relations": ["rdfs:subClassOf"],
                "direct": false,
                "include_self": false
            }
         },
         {
          "fieldName" : "otherField" ,
          "graphRestriction":  {
          ...
            }
         }
 ] 

Currently our upload format looks as follows:

{
  "data": [
    {
      "upstreamId": "ID",     // Optional
          "priority": 3,          // Optional
      "text": "TEXT",     // Mandatory
      "context": "field"   // Optional (if not provided, data points will be auto-assigned to the `default` context)
    }
  ]
}

I do not think our upload file format will need to change, assuming the context will contain a field that is part of the list of fields for that project.

henrietteharmse commented 3 years ago

@mshadbolt @Alexey-ebi @peterwharrison @dosumis @zoependlington @tskir @udp @tudorgroza

Please feel free to comment and raise your concerns and/or ideas.

henrietteharmse commented 3 years ago

For HCA the only value used for the relations field is rdfs:subClassOf, the direct field is always false and include_self can be true or false.

Field Values Meaning
restrictions rdfs:subClassOf Must be a subclass of 1 of the terms in the classes field.
include_self true/false It means the term must either be 1 of the classes listed in the classes field. If this is used with a rdfs:subClassOf restriction, it means it can be 1 of the classes OR a subclass of 1 of the classes in the classes field.