biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
114 stars 49 forks source link

Refactor converter generation code #974

Closed cthoyt closed 10 months ago

cthoyt commented 10 months ago

This PR gets rid of code that focuses on lists of curies.Record objects and instead works directly with curies.Converter objects.

Along the way, this also identified issues with the data integrity on MIRIAM, N2T, and Prefix Commons with respect to the TAIR resources (tair.gene and tair.protein) which all used non-specific, overlapping URLs. Therefore, these needed to get cleaned out before being import.

Why do this? If we work directly with converters, we can make use of the CURIE prefix reconciliation tooling to more cleanly refactor the Bioregistry to Converter pipeline (which is causing issues when adding prefix casing variants in a related PR #969)

codecov[bot] commented 10 months ago

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Files Coverage Δ
src/bioregistry/__init__.py 100.00% <ø> (ø)
src/bioregistry/record_accumulator.py 91.41% <100.00%> (+0.27%) :arrow_up:
src/bioregistry/uri_format.py 91.66% <100.00%> (+5.00%) :arrow_up:
src/bioregistry/resource_manager.py 75.15% <66.66%> (-0.08%) :arrow_down:
src/bioregistry/external/prefixcommons.py 21.35% <14.28%> (+0.14%) :arrow_up:
src/bioregistry/external/miriam.py 31.34% <12.50%> (-1.45%) :arrow_down:
src/bioregistry/external/n2t.py 43.90% <22.22%> (-6.10%) :arrow_down:

:loudspeaker: Thoughts on this report? Let us know!.