ffdev-info / wikidp-issues

An issues repository for resolving issues in Wikidata around the records relating to Digital Preservation
GNU General Public License v3.0
1 stars 0 forks source link

Siegfried -update doesn't include Wikidata #25

Closed ross-spencer closed 2 years ago

ross-spencer commented 3 years ago

Description of problem

Reviewing the installation instructions today I am reminded that -update wikidata doesn't exist from the itforarchivists.com website. This should probably be logged on the SF repo but I don't understand the problem space yet, and it's unlikely this will be possible before the release.

richardlehane commented 3 years ago

This should be pretty straightforward to set up. It is mostly handled by itforarchivists website but would also need to make some edits to the sf command to support the new sig file.

For reference, the way this works is:

  1. the update command queries a path e.g. https://www.itforarchivists.com/siegfried/update/deluxe
  2. that path returns a JSON payload with details of the current sf release (e.g. 1.9.1) as well as the date created and a download path for the latest signature
  3. if the sf client calling update is up-to-date (on the same minor version as the current sf release e.g. >= 1.9.0) & if the date created of the signature is after the date created for the client's signature, then the signature is downloaded using that download path
  4. the download is verified by checking the hash & file size against details from the JSON payload

One thing to decide would be whether to add wikidata to the deluxe signature too.

richardlehane commented 2 years ago

This is pretty low hanging fruit. You'd just need to edit cmd/roy/gen.go to include a makeWikidata() function to create a default wikidata.sig file in the data directory (each new release I run go generate to build fresh versions of all the signature files - https://github.com/richardlehane/siegfried/wiki/Release-process).

You could also update the deluxe function in gen.go to include a wikidata identifier in the deluxe.sig as well.

Once that's done, I would just need to make a couple of small config changes to the itforachivists website & then sf -update wikidata would be ready to go

ross-spencer commented 2 years ago

Thanks @richardlehane I should be able to fit this in soon!

NB. for context those not in the email loop - a 429 was reported this week - essentially Wikidata limiting the query for a single user. This could happen to anyone though and though I don't think we'll see it often and we follow the Wikidata guidance about user-agent the update function is a good workaround. And will also be quicker than harvest/build too. The only draw-back is Wikidata will only updated with SF/Roy releases but for testing and letting folks see the range of Wikidata identifications this will be more than fine I am sure. Super users will be able to harvest/build.

ross-spencer commented 2 years ago

@richardlehane I might have made those changes here? (eerily simple so I am a little unsure!) https://github.com/richardlehane/siegfried/pull/178 - let me know what you think. I've tried to structure the PR somewhat sensibly from what I can see in the Siegfried repo. Happy to make any necessary changes!

ross-spencer commented 2 years ago

Now featured in Siegfried 1.9.3!