biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
108 stars 47 forks source link

Proactively update Prefix.cc #1056

Closed cthoyt closed 3 months ago

cthoyt commented 3 months ago

References https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1038

Prefix.cc is a website that allows for public curation of CURIE prefix/URI prefix pairs that are useful in semantic web applications. Users can submit new CURIE prefixes, add additional URI prefixes to existing CURIE prefixes, and up- or down-vote URI prefixes for a given CURIE prefix. This can all be done from the website at https://prefix.cc or by its secret API.

To protect from abuse, these actions are limited both on the website and API to one creation and one vote per day. Interestingly, Prefix.cc has an RSS feed of when new content is added. It shows that there isn't typically more than one interaction with the underlying database per day. However, Prefix.cc nevertheless has vandalism, including URI prefixes that link to elicit websites.

Though it is a generic system, its content mostly reflects semantic web and information science, with a few life and natural sciences prefixes included. This falls within scope for the Bioregistry, but so far, we don't automatically ingest Prefix.cc because 1) it doesn't include any context to go with CURIE prefix/URI prefix pairs, 2) its scope isn't generally overlapping enough, 3) it includes a lot of vandalism, and since the Bioregistry copies data wholesale, this skeeves me out (e.g., I don't want the Bioregistry to include explicitly references to pornographic sites in its cached data). That all being said, Prefix.cc does include a lot of interesting prefixes that could supplement the Bioregistry, and we could incrementally identify parts of Prefix.cc to curate more carefully.

For now, this PR automates updating Prefix.cc with content from the Bioregistry on a nightly basis in GitHub Actions, to better include life science content in it. Unfortunately, because of the rate limit of one request to Prefix.cc's creation endpoint per day, this is going to take a while!

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 0% with 29 lines in your changes are missing coverage. Please review.

Project coverage is 40.43%. Comparing base (2c12560) to head (44e5fe2). Report is 18 commits behind head on main.

Files Patch % Lines
src/bioregistry/export/prefixcc.py 0.00% 29 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1056 +/- ## ========================================== - Coverage 40.57% 40.43% -0.15% ========================================== Files 148 149 +1 Lines 8244 8273 +29 Branches 1910 1916 +6 ========================================== Hits 3345 3345 - Misses 4690 4719 +29 Partials 209 209 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.