geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

Clarify GORULE:0000045 With/from: Verify that the combination of evidence (ECO) codes conform to the rules in eco-usage-constraints.yaml #1873

Open cmungall opened 2 years ago

cmungall commented 2 years ago

"With/from: Verify that the combination of evidence (ECO) codes conform to the rules in eco-usage-constraints.yaml"

This is not very useful to people - where is this file found? It looks some people are randomly searching and finding copies on S3.

This seems to be the source of truth:

https://github.com/geneontology/go-site/blob/master/metadata/eco-usage-constraints.yaml

This is a very odd file, it uses a non-json compatible subset of yaml

There is virtually no documentation for this file https://github.com/geneontology/go-site/blob/master/metadata/eco-usage-constraints.schema.yaml

There is a json version of this that gets generated on S3 where the yaml references are expanded https://s3.amazonaws.com/go-public/metadata/eco-usage-constraints.json

But that is not an official product here http://current.geneontology.org/metadata/index.html

I think people are confused by the use of the BET types in here; for example:

  {
      "entity_type": {
        "id": "NCIT:C20130",
        "name": "protein family"
      }
    },

This is not actually saying that people should use NCIT IDs. Instead it is saying use some protein family ID. The NCIT ID is just the ID in the BET upper level "ontology". In order to see what kind of IDs are valid, it is necessary to lookup go-dbxrefs.yaml

For example,

- database: PANTHER
  name: Protein ANalysis THrough Evolutionary Relationships Classification System
  rdf_uri_prefix: https://identifiers.org/panther.family
  generic_urls:
    - http://www.pantherdb.org/
  entity_types:
    - type_name: protein family
      type_id: NCIT:C20130
      id_syntax: PTN[0-9]{9}|PTHR[0-9]{5}_[A-Z0-9]+
      url_syntax: http://www.pantherdb.org/panther/lookupId.jsp?id=[example_id]
      example_id: PANTHER:PTHR11455
      example_url: http://www.pantherdb.org/panther/lookupId.jsp?id=PTHR10000

This is of course incredibly complex for people to expect to figure this out

We had a plan to fix some of this in:

As Tony said, the use of the BET ontology was only ever intended as a temporary measure

pgaudet commented 8 months ago

Partially overlapping with other rules: