Closed baskaufs closed 2 years ago
Some background research:
There doesn't seem to be any way in the W3C Generating RDF from Tabular Data on the Web Recommendation for specifying that a value is a blank node. So it's going to have to be a hack.
Here's the JSON to send to the API to great a "somevalue" blank node value:
{
"action": "wbcreateclaim",
"format": "json",
"entity": "Q346",
"snaktype": "somevalue",
"property": "P61",
"token": "+\\"
}
When a query is made to the Query Service, the blank nodes are identified using Skolem IRIs:
<http://www.wikidata.org/.well-known/genid/86c4ed0e862509f61bba3ad98a1d5840>
Oddly, the blank node identifier values are different for the direct (truthy) statement using the wdt:
property and the corresponding value of the indirect path through the statement node using the ps:
property. See the results of this query for example.
The blank node values seem to always be "dead ends". I haven't seen any cases where they are used as subjects of other triples, although they could be.
The best approach seems to be to put in a placeholder blank node identifier of the form:
_:95664f3e04a3e885d2e5de8f912f0669
where the number is the hash assigned to generate the Skolem IRI. Unfortunately, this hash identifier isn't provided in the JSON returned by the API, which looks like this:
{
"entity": {
"type": "item",
"id": "Q15397819",
"labels": {
"en": {
"language": "en",
"value": "Wikidata Sandbox 3"
}
},
"descriptions": {
"en": {
"language": "en",
"value": "test item"
}
},
"aliases": {},
"claims": {
"P170": [
{
"mainsnak": {
"snaktype": "somevalue",
"property": "P170",
"hash": "d3550e860f988c6675fff913440993f58f5c40c5",
"datatype": "wikibase-item"
},
"type": "statement",
"qualifiers": {
"P3831": [
{
"snaktype": "value",
"property": "P3831",
"hash": "85949230fce9fa2d3d310429b4ae408f90b65ea1",
"datavalue": {
"value": {
"entity-type": "item",
"numeric-id": 4233718,
"id": "Q4233718"
},
"type": "wikibase-entityid"
},
"datatype": "wikibase-item"
}
]
},
"qualifiers-order": [
"P3831"
],
"id": "Q15397819$114325cc-45c5-d092-3d28-ec38af53a627",
"rank": "normal",
"references": [
{
"hash": "639df5bed078b55446ef58363518a67844e1ec73",
"snaks": {
"P813": [
{
"snaktype": "value",
"property": "P813",
"hash": "6c9fe1acb4fa83475e848a689d5210b6fd31db07",
"datavalue": {
"value": {
"time": "+2022-01-12T00:00:00Z",
"timezone": 0,
"before": 0,
"after": 0,
"precision": 11,
"calendarmodel": "http://www.wikidata.org/entity/Q1985727"
},
"type": "time"
},
"datatype": "time"
}
],
"P854": [
{
"snaktype": "value",
"property": "P854",
"hash": "62673d7ea18105e7189ab79618569c59fa3eaa6a",
"datavalue": {
"value": "https://example.org/",
"type": "string"
},
"datatype": "url"
}
]
},
"snaks-order": [
"P813",
"P854"
]
},
{
"hash": "c916fcb7b2055e8245c2b46406ecdf1c66998747",
"snaks": {
"P854": [
{
"snaktype": "value",
"property": "P854",
"hash": "10832471104971865db325a3e29aafc6930dd029",
"datavalue": {
"value": "http://vanderbilt.edu/",
"type": "string"
},
"datatype": "url"
}
],
"P813": [
{
"snaktype": "value",
"property": "P813",
"hash": "6c9fe1acb4fa83475e848a689d5210b6fd31db07",
"datavalue": {
"value": {
"time": "+2022-01-12T00:00:00Z",
"timezone": 0,
"before": 0,
"after": 0,
"precision": 11,
"calendarmodel": "http://www.wikidata.org/entity/Q1985727"
},
"type": "time"
},
"datatype": "time"
}
]
},
"snaks-order": [
"P854",
"P813"
]
}
]
}
]
},
"lastrevid": 1561225835
},
"success": 1
}
So some temporary placeholder (e.g. UUID) is going to have to be inserted in a manner such as was done in Vanderbot for value nodes. The actual hash can be retrieved using a SPARQL query.
VanderBot script and acquire_wikidata_metadata.py scripts modified to use somevalue
snaks and blank nodes (i.e. to handle anonymous) in https://github.com/HeardLibrary/linked-data/commit/25d2114ac135e35f2e4d7baa10ae4ffad34a3930 and earlier commits.
As of 2021-11-16, the qid column of the ouput CSV has "anon" for anonymous works. Do we just put in the Q ID for "anonymous" and let a bot fix it or do I try to fix the VanderBot script to handle "some value" (blank nodes)?