INCATools / semantic-sql

SQL and SQLite builds of OWL ontologies
https://incatools.github.io/semantic-sql/
BSD 3-Clause "New" or "Revised" License
37 stars 3 forks source link

Include PyOBO products #45

Open cmungall opened 2 years ago

cmungall commented 2 years ago

See https://github.com/biopragmatics/obo-db-ingest

It would be quite easy to add these as builds, and distribute the sqlite on s3.

Advantages:

Note there is ongoing discussion about URIs for these, but semantic-sql doesn't care, we store things natively as CURIEs, and the prefix table can be swapped to anything.

Ideally the products would be built and distributed (obo/owl/json) upstream, to avoid running the build step, as this introduces an additional source of potential pipeline failure, we also have to determine memory/disk requirements

cc @cthoyt

cthoyt commented 2 years ago

Yes they’re all built and distributed on GitHub at the moment but some need to be gzipped. Is that alright?

cmungall commented 1 year ago

gzip is fine. It would be great if all had stable URLs, to avoid modifying the registry entry on new releases (it is worth continuing to explore housing some of these on OBO but that can be pursued separately). Standardizing on ISO-8601 for release dates would be great too.

I'm trying a few of these. I am manually adding to the registry for now but perhaps we could come up with some kind of standard registry yaml for this sort of thing.

cthoyt commented 1 year ago

FYI there are now PURLs for these files, standardized to ISO standard for dates when possible. Examples:

Resource Version Type Example PURL
Reactome Sequential https://w3id.org/biopragmatics/resources/reactome/83/reactome.obo
Interpro Major/Minor https://w3id.org/biopragmatics/resources/interpro/92.0/interpro.obo
Interpro Semantic https://w3id.org/biopragmatics/resources/drugbank.salt/5.1.9/drugbank.salt.obo
MeSH Year https://w3id.org/biopragmatics/resources/mesh/2003/mesh.obo.gz
UniProt Year/Month https://w3id.org/biopragmatics/resources/uniprot/2022_05/uniprot.obo.gz
HGNC Date https://w3id.org/biopragmatics/resources/hgnc/2023-02-01/hgnc.obo
CGNC **unversioned*** https://w3id.org/biopragmatics/resources/cgnc/cgnc.obo

to do:

  1. Make a "release" even for versioned ones so there's a stable URL pointing to the most recent version
  2. Standardize date formats further, e.g. for UniProt, Wikipathways, etc
  3. Create some kind of manifest file of the latest build
cmungall commented 1 year ago

Awesome!!!

On Thu, Mar 16, 2023 at 4:33 PM Charles Tapley Hoyt < @.***> wrote:

FYI there are now PURLs for these files, standardized to ISO standard for dates when possible. Examples: Resource Version Type Example PURL Reactome Sequential https://w3id.org/biopragmatics/resources/reactome/83/reactome.obo Interpro Major/Minor https://w3id.org/biopragmatics/resources/interpro/92.0/interpro.obo Interpro Semantic https://w3id.org/biopragmatics/resources/drugbank.salt/5.1.9/drugbank.salt.obo MeSH Year https://w3id.org/biopragmatics/resources/mesh/2003/mesh.obo.gz UniProt Year/Month https://w3id.org/biopragmatics/resources/uniprot/2022_05/uniprot.obo.gz HGNC Date https://w3id.org/biopragmatics/resources/hgnc/2023-02-01/hgnc.obo CGNC *unversioned** https://w3id.org/biopragmatics/resources/cgnc/cgnc.obo

to do:

  1. Make a "release" even for versioned ones so there's a stable URL pointing to the most recent version
  2. Standardize date formats further, e.g. for UniProt, Wikipathways, etc

— Reply to this email directly, view it on GitHub https://github.com/INCATools/semantic-sql/issues/45#issuecomment-1472895731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONB2CCIWIQNC7OL73LW4OPL5ANCNFSM53SXONVA . You are receiving this because you authored the thread.Message ID: @.***>

cmungall commented 1 year ago

and remember, what you have is swissprot, NOT uniprot! :-)

cthoyt commented 1 year ago

@cmungall here's the manifest file, with PURLs for each of the most recent artifacts listed in it: https://github.com/biopragmatics/obo-db-ingest/blob/main/manifest.yml