JeroenDeDauw / QueryrAPI

🖹 REST API for Wikibase data
https://wikibase.consulting
7 stars 0 forks source link

Labels vs ids in the item response format #36

Closed JeroenDeDauw closed 8 years ago

JeroenDeDauw commented 8 years ago

Preliminary item response format documentation: http://queryr.wmflabs.org/about/docs/item

The premise is that for users that want to do something simple "with Wikipedia data" and that do not know about Wikidata having a way to not force them to deal with item and property ids is nice. At the same time people will want to be able to have the stability of using the ids as well.

A first question is whether this premise is correct: does providing access by labels add real value? Assuming it does, both classes of use cases can be served. However, it's still useful to know which class is more important for determining which one gets served by default in places where such a choice needs to be made.

Below are different approaches to serving both classes of use cases:

Two formats

This is pretty straightforward: one format like the current one, and one in which IDs are used (both for the properties and for item references).

Embedded labels/ids

Have label keys and embed the ids

data: {
    "population": {
        "property_id": "P123",
        "value": 9001,
        "type": "number"
    }
}

Or have id keys and embed the labels

data: {
    "P123": {
        "label": "population",
        "value": 9001,
        "type": "number"
    }
}

One can of course not have keys at all and embed both, as done by chantek.

Map from id to label

Use ids as keys and provide a map from label to id

data: {
    "capital city": {
        "value": "Berlin",
        "type": "string"
        // Something here would need to change so the user knows this an item id
    }
},
item_ids_by_label: {
    "Berlin": "Q64"
},
property_ids_by_label: {
    "capital city": "P123"
}
JeroenDeDauw commented 8 years ago

These are presumably not all approaches, and the ones listed can be combined in various manners. What do you think is the best combination of trade offs and why?

Somewhat related: How lean should the item and property endpoints be?

jane023 commented 8 years ago

What about the links to modelling items such as in this item for "calendar model" and "units"? Do these need labels? It seems obvious from the data returned, but I am not sure. http://queryr.wmflabs.org/api/items/Q185372

JeroenDeDauw commented 8 years ago

The possible format demo'd here uses the embedding approach. Comments welcome.

JeroenDeDauw commented 8 years ago

What about the links to modelling items such as in this item for "calendar model" and "units"? Do these need labels? It seems obvious from the data returned, but I am not sure. http://queryr.wmflabs.org/api/items/Q185372

I've not put any thought into how to best deal with values of type time, quantity and coordinate yet. This seems pretty isolated and something best done when the format they reside in has been hammered out more.

Are you saying you think these should or should not include labels of linked items?

JeroenDeDauw commented 8 years ago
jane023 commented 8 years ago

It's hard for me to understand all of your questions, but I suppose I would use such an api for quick lookups both to retrieve Q numbers for topics/titles but also to see whether such-and-such a property exists yet. The problem with labels in general is twofold: spelling issues and language issues. For exact the same spelling across languages there is currently no way to find these as far as I know. Even if you enable "fall-back languages", then this is only one more language, though we have 100's of them now.