HeardLibrary / vandycite

0 stars 0 forks source link

Decide how to handle anonymous artists #21

Closed baskaufs closed 2 years ago

baskaufs commented 2 years ago

As of 2021-11-16, the qid column of the ouput CSV has "anon" for anonymous works. Do we just put in the Q ID for "anonymous" and let a bot fix it or do I try to fix the VanderBot script to handle "some value" (blank nodes)?

baskaufs commented 2 years ago

Some background research:

There doesn't seem to be any way in the W3C Generating RDF from Tabular Data on the Web Recommendation for specifying that a value is a blank node. So it's going to have to be a hack.

Here's the JSON to send to the API to great a "somevalue" blank node value:

{
    "action": "wbcreateclaim",
    "format": "json",
    "entity": "Q346",
    "snaktype": "somevalue",
    "property": "P61",
    "token": "+\\"
}

When a query is made to the Query Service, the blank nodes are identified using Skolem IRIs:

<http://www.wikidata.org/.well-known/genid/86c4ed0e862509f61bba3ad98a1d5840>

Oddly, the blank node identifier values are different for the direct (truthy) statement using the wdt: property and the corresponding value of the indirect path through the statement node using the ps: property. See the results of this query for example.

The blank node values seem to always be "dead ends". I haven't seen any cases where they are used as subjects of other triples, although they could be.

baskaufs commented 2 years ago

The best approach seems to be to put in a placeholder blank node identifier of the form:

_:95664f3e04a3e885d2e5de8f912f0669

where the number is the hash assigned to generate the Skolem IRI. Unfortunately, this hash identifier isn't provided in the JSON returned by the API, which looks like this:

{
    "entity": {
        "type": "item",
        "id": "Q15397819",
        "labels": {
            "en": {
                "language": "en",
                "value": "Wikidata Sandbox 3"
            }
        },
        "descriptions": {
            "en": {
                "language": "en",
                "value": "test item"
            }
        },
        "aliases": {},
        "claims": {
            "P170": [
                {
                    "mainsnak": {
                        "snaktype": "somevalue",
                        "property": "P170",
                        "hash": "d3550e860f988c6675fff913440993f58f5c40c5",
                        "datatype": "wikibase-item"
                    },
                    "type": "statement",
                    "qualifiers": {
                        "P3831": [
                            {
                                "snaktype": "value",
                                "property": "P3831",
                                "hash": "85949230fce9fa2d3d310429b4ae408f90b65ea1",
                                "datavalue": {
                                    "value": {
                                        "entity-type": "item",
                                        "numeric-id": 4233718,
                                        "id": "Q4233718"
                                    },
                                    "type": "wikibase-entityid"
                                },
                                "datatype": "wikibase-item"
                            }
                        ]
                    },
                    "qualifiers-order": [
                        "P3831"
                    ],
                    "id": "Q15397819$114325cc-45c5-d092-3d28-ec38af53a627",
                    "rank": "normal",
                    "references": [
                        {
                            "hash": "639df5bed078b55446ef58363518a67844e1ec73",
                            "snaks": {
                                "P813": [
                                    {
                                        "snaktype": "value",
                                        "property": "P813",
                                        "hash": "6c9fe1acb4fa83475e848a689d5210b6fd31db07",
                                        "datavalue": {
                                            "value": {
                                                "time": "+2022-01-12T00:00:00Z",
                                                "timezone": 0,
                                                "before": 0,
                                                "after": 0,
                                                "precision": 11,
                                                "calendarmodel": "http://www.wikidata.org/entity/Q1985727"
                                            },
                                            "type": "time"
                                        },
                                        "datatype": "time"
                                    }
                                ],
                                "P854": [
                                    {
                                        "snaktype": "value",
                                        "property": "P854",
                                        "hash": "62673d7ea18105e7189ab79618569c59fa3eaa6a",
                                        "datavalue": {
                                            "value": "https://example.org/",
                                            "type": "string"
                                        },
                                        "datatype": "url"
                                    }
                                ]
                            },
                            "snaks-order": [
                                "P813",
                                "P854"
                            ]
                        },
                        {
                            "hash": "c916fcb7b2055e8245c2b46406ecdf1c66998747",
                            "snaks": {
                                "P854": [
                                    {
                                        "snaktype": "value",
                                        "property": "P854",
                                        "hash": "10832471104971865db325a3e29aafc6930dd029",
                                        "datavalue": {
                                            "value": "http://vanderbilt.edu/",
                                            "type": "string"
                                        },
                                        "datatype": "url"
                                    }
                                ],
                                "P813": [
                                    {
                                        "snaktype": "value",
                                        "property": "P813",
                                        "hash": "6c9fe1acb4fa83475e848a689d5210b6fd31db07",
                                        "datavalue": {
                                            "value": {
                                                "time": "+2022-01-12T00:00:00Z",
                                                "timezone": 0,
                                                "before": 0,
                                                "after": 0,
                                                "precision": 11,
                                                "calendarmodel": "http://www.wikidata.org/entity/Q1985727"
                                            },
                                            "type": "time"
                                        },
                                        "datatype": "time"
                                    }
                                ]
                            },
                            "snaks-order": [
                                "P854",
                                "P813"
                            ]
                        }
                    ]
                }
            ]
        },
        "lastrevid": 1561225835
    },
    "success": 1
}

So some temporary placeholder (e.g. UUID) is going to have to be inserted in a manner such as was done in Vanderbot for value nodes. The actual hash can be retrieved using a SPARQL query.

baskaufs commented 2 years ago

VanderBot script and acquire_wikidata_metadata.py scripts modified to use somevalue snaks and blank nodes (i.e. to handle anonymous) in https://github.com/HeardLibrary/linked-data/commit/25d2114ac135e35f2e4d7baa10ae4ffad34a3930 and earlier commits.