Helsinki-NLP / OPUS-API

API for searching corpora from OPUS
1 stars 0 forks source link

API endpoint to get dataset by id #3

Open jelmervdl opened 1 year ago

jelmervdl commented 1 year ago

Hey there,

Can there be an API endpoint to get data & metadata on a dataset just by its ID? I'm internally using the 'id' value that the API returns to uniquely identify a result. I'm now keeping the API responses around constantly to be able to do id based lookups (e.g. I have an id for the dataset I want to download, now I need the url) but between restarts those responses are lost.

Ideally I would be able to also search the API with just that id, returning just the corpus that is identified by that id. E.g.

https://opus.nlpl.eu/opusapi/?id=9589

would return something like:

{
    "corpora":
    [
        {
            "alignment_pairs": 58008,
            "corpus": "bible-uedin",
            "documents": 2,
            "id": 9589,
            "latest": "True",
            "preprocessing": "smt",
            "size": 22727,
            "source": "en",
            "source_tokens": 1682568,
            "target": "nl",
            "target_tokens": 1820195,
            "url": "https://object.pouta.csc.fi/OPUS-bible-uedin/v1/smt/en-nl.zip",
            "version": "v1"
        }
    ]
}