biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
119 stars 51 forks source link

Better documentation of the registry data files #232

Closed dhimmel closed 2 years ago

dhimmel commented 2 years ago

From the readme:

The underlying data of the Bioregistry can be downloaded directly from here. Several exports to YAML, TSV, and RDF can be downloaded via https://bioregistry.io/download.

Looking through the repo, I see the following two locations for data:

But I don't see any docs for these files, such as a README file.

In https://github.com/manubot/manubot/issues/305, @cthoyt pointed me to docs/_data/registry.tsv which contains what I want: the actual bioregistry metadata fields after merging all the input registries (i.e. the resolved/consensus view). But docs/_data/registry.yaml seems not to contain all the output fields (despite having the same file stem name).

This issue has two parts:

  1. better documentation of the data, especially the output files
  2. based on 1, a request to create a JSON/YAML format of the consensus view available in docs/_data/registry.tsv
bgyori commented 2 years ago

I agree, we should work on these, thanks!

cthoyt commented 2 years ago

Consensus JSON and consensus YAML files are available as of #235 along with some nice documentation updates. #236 will update the documentation of the internal data then we can see if we've addressed all of @dhimmel's concerns

cthoyt commented 2 years ago

Sorry for the double message but I wanted to get this all cleaned up and merged. If there's any feedback or follow-up that you think we should do after #235 and #236, please let us know :)

Update docs (as README.md files) can be found in:

  1. Exports folder at https://github.com/biopragmatics/bioregistry/tree/main/exports
  2. Internal data folder at https://github.com/biopragmatics/bioregistry/tree/main/src/bioregistry/data
dhimmel commented 2 years ago

Fantastic! Changes look great and much clearer now what data is where. Looking forward to using exports/registry/registry.json.