ffdev-info / wikidp-issues

An issues repository for resolving issues in Wikidata around the records relating to Digital Preservation
GNU General Public License v3.0
1 stars 0 forks source link

Should we combine the Wikimedia API with the SPARQL to aid with versioning by using it to generate permalinks? #6

Closed ross-spencer closed 2 years ago

ross-spencer commented 3 years ago

Description of problem

For discussion, as the Wikidata website's back-end is Wikimedia/Wikibase. There is also a REST API that can be used: see docs.

This can be used to generate permalinks. Permalinks might compliment signature versioning and record versioning. Between the two we will always be able to arrive back at the exact version of a record used to create an identification within Siegfried's signature file.

(Is it true to say that the Wikidata page is the fourth value of a "quad" like structure that accurately helps us to reflect versions? Is there an equivalent in the SPARQL?)

Data about a page can be retrieved from the API in JSON:

E.g. for Portable network graphics (PNG):

Within this data structure we have access to the last revision ID, which is unique to every record on the platform:

    "lastrevid": 784082439,
    "modified": "2018-11-07T16:26:11Z",
    "ns": 0,
    "pageid": 29052990,
    "sitelinks": {},
    "title": "Q27229608",
    "type": "item"

And it can be used to construct the pemalink to that record:

So, again for PNG, we can link back to the record used "today" to download information as follows:

ross-spencer commented 3 years ago

I have created a demo app which uses the Query API to do some of the heavy lifting here. The structs we create we should be able to splice into the SPARQL we generate for the Wikidata signature file but we'll need to sit down and look at that briefly to see what shape it will take.

Sample app: https://github.com/ross-spencer/wikiprov

There is a recording of it working here: https://asciinema.org/a/382378

The data might look as follows for a given QID:

ross-spencer:~/git/ross-spencer/wikiprov/cmd/wikiprov$ ./wikiprov -qid Q27229608 -history 5
{
  "Title": "Q27229608",
  "Revision": 784082439,
  "Modified": "2018-11-07T16:26:11Z",
  "Permalink": "https://www.wikidata.org/w/index.php?format=json&oldid=784082439&title=Q27229608",
  "History": [
    "2018-11-07T16:26:11Z (oldid: 784082439): 'Zyksnowy' edited: '/* wbsetdescription-add:1|zh */ 檔案格式'",
    "2018-11-07T16:17:17Z (oldid: 784074292): 'Zyksnowy' edited: '/* wbsetlabel-set:1|zh */ 便攜式網絡圖形,版本1.1'",
    "2018-11-07T16:17:13Z (oldid: 784074227): 'Zyksnowy' edited: '/* wbsetlabel-add:1|zh */ 便攜式網絡圖形,版本1.0'",
    "2018-10-01T07:33:05Z (oldid: 755658810): 'Escudero' edited: '/* wbremoveclaims-remove:1| */ [[Property:P2748]]: 12'",
    "2018-10-01T07:32:30Z (oldid: 755658633): 'Escudero' edited: '/* wbcreateclaim-create:1| */ [[Property:P2748]]: 12, Matched to [[:toollabs:mix-n-match/#/entry/60074843|Portable Network Graphics, version 1.1 (#60074843)]] #mix'n'match'"
  ]
}

I also tried the wbGetEnttities endpoint specifically the info call and that worked well but didn't give us as much information. The query endpoint is slightly different (I don't know how the different endpoints are categorized) but the docs for the revisions module(?) are here for reference: query revisions.

ross-spencer commented 3 years ago

RE: values like wbsetlabel-set and wbsetdescription-add it looks like the following file provides a mapping we can use to prettify the output in wikiprov output: https://github.com/wikimedia/Wikibase/blob/master/lib/i18n/en.json NB. We would need the set of all the languages to do this in the most i18n way.

ross-spencer commented 2 years ago

This will be available in the next Siegfried, from commit: https://github.com/richardlehane/siegfried/commit/9d8dc3eb422847718a44d87e80278c6a5ab0c075

The change adds a permalink to all results through Wikidata. Inspect will also show a revision history - the five previous changes to the item record associated with a format.

filename : 'ba53ba11.ff2'
filesize : 10
modified : 2021-11-20T23:21:48+01:00
errors   : 
matches  :
  - ns        : 'wikidata'
    id        : 'Q7'
    format    : 'FFIIFF'
    URI       : 'http://wikidata.org/entity/Q7'
    permalink : 'http://wikidata.org/w/index.php?oldid=45&title=Item%3AQ7'
    mime      : 'x-application/format-two'
    basis     : 'extension match ff2; byte match at [[0 2] [8 2]] (another format registry (source date: 1970-01-02))'
    warning   : 
Format info: Name: 'FFIIFF'
MIMEType: 'x-application/format-two'
Sources: 'another format registry (source date: 1970-01-02)' 
Revision History: {
  "Title": "Item:Q7",
  "Revision": 45,
  "Modified": "2021-11-20T18:49:46Z",
  "Permalink": "http://wikidata.org/w/index.php?oldid=45&title=Item%3AQ7",
  "History": [
    "2021-11-20T18:49:46Z (oldid: 45): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P8]]: BA11'",
    "2021-11-20T18:49:02Z (oldid: 44): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P8]]: BA5E'",
    "2021-11-20T18:47:29Z (oldid: 43): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P6]]: ff2'",
    "2021-11-20T18:47:17Z (oldid: 42): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P7]]: x-application/format-two'",
    "2021-11-20T18:46:59Z (oldid: 41): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P9]]: [[Item:Q1]]'"
  ]
}
---
QID: (Q7)
globs: *.ff2
sigs: (B:0 seq ba5e | E:0 seq ba11)
superiors: none