Closed ross-spencer closed 2 years ago
I have created a demo app which uses the Query API to do some of the heavy lifting here. The structs we create we should be able to splice into the SPARQL we generate for the Wikidata signature file but we'll need to sit down and look at that briefly to see what shape it will take.
Sample app: https://github.com/ross-spencer/wikiprov
There is a recording of it working here: https://asciinema.org/a/382378
The data might look as follows for a given QID:
ross-spencer:~/git/ross-spencer/wikiprov/cmd/wikiprov$ ./wikiprov -qid Q27229608 -history 5
{
"Title": "Q27229608",
"Revision": 784082439,
"Modified": "2018-11-07T16:26:11Z",
"Permalink": "https://www.wikidata.org/w/index.php?format=json&oldid=784082439&title=Q27229608",
"History": [
"2018-11-07T16:26:11Z (oldid: 784082439): 'Zyksnowy' edited: '/* wbsetdescription-add:1|zh */ 檔案格式'",
"2018-11-07T16:17:17Z (oldid: 784074292): 'Zyksnowy' edited: '/* wbsetlabel-set:1|zh */ 便攜式網絡圖形,版本1.1'",
"2018-11-07T16:17:13Z (oldid: 784074227): 'Zyksnowy' edited: '/* wbsetlabel-add:1|zh */ 便攜式網絡圖形,版本1.0'",
"2018-10-01T07:33:05Z (oldid: 755658810): 'Escudero' edited: '/* wbremoveclaims-remove:1| */ [[Property:P2748]]: 12'",
"2018-10-01T07:32:30Z (oldid: 755658633): 'Escudero' edited: '/* wbcreateclaim-create:1| */ [[Property:P2748]]: 12, Matched to [[:toollabs:mix-n-match/#/entry/60074843|Portable Network Graphics, version 1.1 (#60074843)]] #mix'n'match'"
]
}
I also tried the wbGetEnttities
endpoint specifically the info
call and that worked well but didn't give us as much information. The query
endpoint is slightly different (I don't know how the different endpoints are categorized) but the docs for the revisions
module(?) are here for reference: query revisions.
RE: values like wbsetlabel-set
and wbsetdescription-add
it looks like the following file provides a mapping we can use to prettify the output in wikiprov
output: https://github.com/wikimedia/Wikibase/blob/master/lib/i18n/en.json NB. We would need the set of all the languages to do this in the most i18n way.
This will be available in the next Siegfried, from commit: https://github.com/richardlehane/siegfried/commit/9d8dc3eb422847718a44d87e80278c6a5ab0c075
The change adds a permalink to all results through Wikidata. Inspect will also show a revision history - the five previous changes to the item record associated with a format.
filename : 'ba53ba11.ff2'
filesize : 10
modified : 2021-11-20T23:21:48+01:00
errors :
matches :
- ns : 'wikidata'
id : 'Q7'
format : 'FFIIFF'
URI : 'http://wikidata.org/entity/Q7'
permalink : 'http://wikidata.org/w/index.php?oldid=45&title=Item%3AQ7'
mime : 'x-application/format-two'
basis : 'extension match ff2; byte match at [[0 2] [8 2]] (another format registry (source date: 1970-01-02))'
warning :
Format info: Name: 'FFIIFF'
MIMEType: 'x-application/format-two'
Sources: 'another format registry (source date: 1970-01-02)'
Revision History: {
"Title": "Item:Q7",
"Revision": 45,
"Modified": "2021-11-20T18:49:46Z",
"Permalink": "http://wikidata.org/w/index.php?oldid=45&title=Item%3AQ7",
"History": [
"2021-11-20T18:49:46Z (oldid: 45): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P8]]: BA11'",
"2021-11-20T18:49:02Z (oldid: 44): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P8]]: BA5E'",
"2021-11-20T18:47:29Z (oldid: 43): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P6]]: ff2'",
"2021-11-20T18:47:17Z (oldid: 42): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P7]]: x-application/format-two'",
"2021-11-20T18:46:59Z (oldid: 41): 'Admin' edited: '/* wbsetclaim-create:2||1 */ [[Property:P9]]: [[Item:Q1]]'"
]
}
---
QID: (Q7)
globs: *.ff2
sigs: (B:0 seq ba5e | E:0 seq ba11)
superiors: none
Description of problem
For discussion, as the Wikidata website's back-end is Wikimedia/Wikibase. There is also a REST API that can be used: see docs.
This can be used to generate permalinks. Permalinks might compliment signature versioning and record versioning. Between the two we will always be able to arrive back at the exact version of a record used to create an identification within Siegfried's signature file.
(Is it true to say that the Wikidata page is the fourth value of a "quad" like structure that accurately helps us to reflect versions? Is there an equivalent in the SPARQL?)
Data about a page can be retrieved from the API in JSON:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids={record-id}
E.g. for Portable network graphics (PNG):
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q27229608
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q27229608&format=json
Within this data structure we have access to the last revision ID, which is unique to every record on the platform:
And it can be used to construct the pemalink to that record:
https://www.wikidata.org/w/index.php?oldid={latestrevid}
So, again for PNG, we can link back to the record used "today" to download information as follows:
https://www.wikidata.org/w/index.php?oldid=784082439