ad-freiburg / osm2rdf

Convert OpenStreetMap (OSM) data to RDF Turtle, including the object geometries and predicates geo:sfContains and geo:sfIntersects. Weekly updated downloads for the whole planet (~ 40 billion triples) and per country.
https://osm2rdf.cs.uni-freiburg.de
GNU General Public License v3.0
19 stars 5 forks source link

Extend the value of <https://www.openstreetmap.org/wiki/Key:wikidata> to become a proper URI #49

Open l00mi opened 1 year ago

l00mi commented 1 year ago

The value of <https://www.openstreetmap.org/wiki/Key:wikidata>seems to be simply the Q-Number of Wikidata. In RDF the Q Numbers of Wikidata are represented as follows e.g. https://www.wikidata.org/wiki/Q116819199.

To make it comfortable to connect and query OSM Entities together with Wikidata it would be great to create for such instances the correct NamedNodes instead of Literals.

l00mi commented 1 year ago

As work around the following Federated Query works:

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
  ?relation <https://www.openstreetmap.org/wiki/Key:place> "country" .
  ?relation <https://www.openstreetmap.org/wiki/Key:name:en> ?name.
  ?relation <https://www.openstreetmap.org/wiki/Key:wikidata> ?wdValue.
  BIND(uri(concat("http://www.wikidata.org/entity/", ?wdValue)) as ?wd)

  SERVICE <https://query.wikidata.org/sparql> {
    ?wd wdt:P31 ?type.
    ?type rdfs:label ?typeName.
    FILTER(lang(?typeName)="en")

  }

}
hannahbast commented 1 year ago

There are two predicates in the datasets produced by osm2rdf:

<https://www.openstreetmap.org/wiki/Key:wikidata>, which has an object of the form Q116819199

<https://www.openstreetmap.org/wikidata>, which has an object of the form <https://www.wikidata.org/wiki/Q116819199>

The reasons for the distinction is that the first predicate is how the information is stored in the original data, while the second predicate is more useful. I personally would be in favor of having only one predicate.

@lehmann-4178656ch @patrickbr What do you think?

l00mi commented 1 year ago

Thank you for this pointer, this is good to know. To have it only once, but with the Entity URI would make it to be found easier.

lehmann-4178656ch commented 1 year ago

One of the targets @patrickbr and I formulated when we started the work on osm2rdf was to have access to the raw data/every information provided by the OSM where ever possible.

Collapsing both representations into one would only work for single value (.*:)wikidata entries, if I'm not mistaken.

Transforming every entry without retaining the original would break the goal of retaining all information. We would have to split values and introduce intermediate nodes when lists are provided, e.g.:

osmnode:1080146569 osmkey:brand:wikidata "Q17412635;Q796364;Q36008;Q17412684;Q6686;Q246"

Currently we simply add a single statement for the first entry in the list, which may be not sufficient for lists, but keeps the graph relatively small. This should provide the most important information if the values are ordered accordingly in the OSM:

osmnode:1080146569 osm:brand:wikidata wd:Q17412635 .

We could introduce an entry for every Q-value in the list. This would result in something like the following:

osmnode:1080146569 osm:brand:wikidata wd:Q17412635
osmnode:1080146569 osm:brand:wikidata wd:Q796364
osmnode:1080146569 osm:brand:wikidata wd:Q36008
osmnode:1080146569 osm:brand:wikidata wd:Q17412684
osmnode:1080146569 osm:brand:wikidata wd:Q6686
osmnode:1080146569 osm:brand:wikidata wd:Q246

Additionally, we would need to add intermediate nodes (as mentioned before) to provide the order of the values. This would increase the graph size and introduce an alternative structure if lists are involved. Something like the following could represent arbitrary wikidata entry list data, this would retain all the information (both representations of each entry and their order) but also increase the overall graph.

osmnode:1080146569 osm2rdf:wikidataListEntry _:0
osmnode:1080146569 osm2rdf:wikidataListEntry _:1
osmnode:1080146569 osm2rdf:wikidataListEntry _:2
osmnode:1080146569 osm2rdf:wikidataListEntry _:3
osmnode:1080146569 osm2rdf:wikidataListEntry _:4
osmnode:1080146569 osm2rdf:wikidataListEntry _:5

_:0 osm2rdf:key osmkey:brand:wikidata
_:0 osm2rdf:pos 1
_:0 osm2rdf:value "Q17412635"
_:0 osm:brand:wikidata wd:Q17412635
_:1 osm2rdf:key osmkey:brand:wikidata
_:1 osm2rdf:pos 2
_:1 osm2rdf:value "Q796364"
_:1 osm:brand:wikidata wd:Q796364
_:2 osm2rdf:key osmkey:brand:wikidata
_:2 osm2rdf:pos 3
_:2 osm2rdf:value "Q36008"
_:2 osm:brand:wikidata wd:Q36008
_:3 osm2rdf:key osmkey:brand:wikidata
_:3 osm2rdf:pos 4
_:3 osm2rdf:value "Q17412684"
_:3 osm:brand:wikidata wd:Q17412684
_:4 osm2rdf:key osmkey:brand:wikidata
_:4 osm2rdf:pos 5
_:4 osm2rdf:value "Q6686"
_:4 osm:brand:wikidata wd:Q6686
_:5 osm2rdf:key osmkey:brand:wikidata
_:5 osm2rdf:pos 6
_:5 osm2rdf:value "Q246"
_:5 osm:brand:wikidata wd:Q246

This opens the question whether or not single entries should always be treated as if they are lists and therefore explicitly state this information. Treating every single value as a list would make the representation uniform but introduce the overhead many entries as having lists in wikidata fields is far less common than having only single values.

I'm open for suggestion which allow us to not lose any information found in the original data without making the original data hard to find. I'll also try to talk to @patrickbr next week about this.

l00mi commented 1 year ago

For this specific key, I would argue this is maintaining the original information. You just adapt the format to the medium you convert into.

Regarding the lists, does the order have in some keys actual meaning? If so, you should also consider good'ol https://www.w3.org/TR/rdf-schema/#ch_list.

patrickbr commented 1 year ago

Thanks for the suggestion, @l00mi ! As @hannahbast said, we are already creating <https://www.openstreetmap.org/wikidata> predicates linking to an URI, albeit not yet for lists of Wikidata IDs (like in the example given by @lehmann-4178656ch). We should not forget, however, that OSM attributes are free strings, and that some users might expect them to be free strings in the RDF dump. Our philosophy so far was to keep these free strings, but add "semantically polished" versions of attribute values where possible.

We are currently discussing whether we should add an option to completely drop the free string representations for OSM attribute values handled like this. Let's keep this issue open until we have arrived at a conclusion :)