clnsmth / soso

For creating Science On Schema.Org (SOSO) markup in dataset landing pages to improve data discovery through search engines.
https://soso.readthedocs.io/
MIT License
1 stars 0 forks source link

soso

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. example workflow codecov

For creating Science On Schema.Org (SOSO) markup in dataset landing pages to improve data discovery through search engines.

Installation

Currently, soso is only available on GitHub. To install it, you need to have pip installed. Once pip is installed, you can install soso by running the following command in your terminal:

$ pip install git+https://github.com/clnsmth/soso.git@main

Metadata Conversion

The primary function is to convert metadata records into SOSO markup. To perform a conversion, specify the file path of the metadata and the desired conversion strategy. Each metadata standard corresponds to a specific strategy.

>>> from soso.main import convert
>>> r = convert(file='metadata.xml', strategy='EML')
>>> r
'{"@context": {"@vocab": "https://schema.org/", "prov": "http://www. ...}'

Some SOSO properties may not be derived from metadata records alone. In such cases, additional information can be provided via kwargs, where keys match the property name, and values are the property value.

For example, the url property representing the landing page URL does not exist in an EML metadata record. But this information is known to the repository hosting the dataset.

>>> kwargs = {'url': 'https://sample-data-repository.org/dataset/472032'}
>>> r = convert(file='metadata.xml', strategy='EML', **kwargs)
>>> r
'{"@context": {"@vocab": "https://schema.org/", "prov": "http://www. ...}'

It's worth noting that this kwargs approach is not limited to supplying unmappable properties; it can be utilized to override any top-level SOSO property.

Unmappable properties are listed in the strategy documentation.

API Reference and User Guide

The API reference and user guide are available on Read the Docs.