digipres / digipres-practice-index

An experiment in gathering together sources of information about digital preservation practices
GNU Affero General Public License v3.0
2 stars 0 forks source link

DigiPres Publications Index v1.0 - the iPres Index #2

Closed anjackson closed 3 months ago

anjackson commented 5 months ago

Broadly following the Format Aggregator pattern.... Gather metadata and text from iPres proceedings. Make it easier for Google to find. Make it easy to search across. Start to think about how to link formal identifyers into the system, so we can find e.g. papers about GIFs.

This is a first iteration to demonstrate the idea, focussed on iPres proceedings.

Writing up covered by digipres/registries-of-practice-project#7

Further work covered by #5

anjackson commented 5 months ago

Updated to link records back, and pull out institutions at least up to a point. Works pretty well in the browser, see this Datasette Lite view

anjackson commented 5 months ago

Looking at the other years. IDEALS turned out to be easier, as I can at least grab chunks of metadata via OAI-PMH and only have to futz with the configuration of that in order to have something useful. To my surprise, OSF is proving more difficult to work with, with two different API versions that don't seem to line up well, and with it being necessary to grab a 'tree' of different files.

anjackson commented 4 months ago

Now updated with some (slightly sketchy) 2022 and 2023 data in place, here.

anjackson commented 4 months ago

Okay, refactored a bit and used Ed's suggested citation_pdf_url trick to pull in the document URLs for 2023. Latest version now has separate landing page and direct document URLs. See here.

2022 data still a bit lacking, as OSF integration needs work.

anjackson commented 4 months ago

Probably need to spend a little time thinking about the tables/structures. e.g.

anjackson commented 4 months ago

Pretty clear, I think, that for now I'm going to have to patch up some of the metadata by hand, but I can at least get the basics in place from the repositories.

anjackson commented 3 months ago

Ah, if you add a _searchmode=raw you can do proper searches, like "this" OR "that". See here

anjackson commented 3 months ago

Okay, so now https://www.digipres.org/publications/ hosts a web-based version and points to the DB/Datasette version.

anjackson commented 3 months ago

Good enough for v1