biopragmatics / curies

🐸 Idiomatic conversion between URIs and compact URIs (CURIEs) in Python
https://curies.readthedocs.io
MIT License
21 stars 6 forks source link

Implement URI or CURIE functionality #92

Closed cthoyt closed 11 months ago

cthoyt commented 1 year ago

The Converter.expand function turns a CURIE into a URI. This PR implements Converter.expand_or_standardize, which works like the normal Converter.expand, but if a URI is given, then it standardizes it and returns it. The other mechanics for "strict" and "passthrough" work the same.

This PR also implements Converter.compress_or_standardize as a counterpart for Converter.compress

Demo

The expansion and compression demos use a very simple extended prefix map:

from curies import Converter, Record

converter = Converter.from_extended_prefix_map([
    Record(
        prefix="CHEBI",
        prefix_synonyms=["chebi"],
        uri_prefix="http://purl.obolibrary.org/obo/CHEBI_",
        uri_prefix_synonyms=["https://identifiers.org/chebi:"],
    ),
])

Expansion

# Expand CURIEs
>>> converter.expand_or_standardize("CHEBI:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_or_standardize("chebi:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'

# Standardize URIs
>>> converter.expand_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'
>>> converter.expand_or_standardize("https://identifiers.org/chebi:138488")
'http://purl.obolibrary.org/obo/CHEBI_138488'

# Handle cases that aren't valid w.r.t. the converter
>>> converter.expand_or_standardize("missing:0000000")
>>> converter.expand_or_standardize("https://example.com/missing:0000000")

Compression

# Compress URIs
>>> converter.compress_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488")
'CHEBI:138488'
>>> converter.compress_or_standardize("https://identifiers.org/chebi:138488")
'CHEBI:138488'

# Standardize CURIEs
>>> converter.compress_or_standardize("CHEBI:138488")
'CHEBI:138488'
>>> converter.compress_or_standardize("chebi:138488")
'CHEBI:138488'

# Handle cases that aren't valid w.r.t. the converter
>>> converter.compress_or_standardize("missing:0000000")
>>> converter.compress_or_standardize("https://example.com/missing:0000000")

Known Use Cases

TODO