Living-with-machines / lwmdb

A django-based library for managing the Living with Machines newspapers metadata database schema
https://living-with-machines.github.io/lwmdb/
MIT License
2 stars 0 forks source link

post-July suggestion: update Atlas of Digitised Newspapers with LwM corpora details #131

Open kmcdono2 opened 1 year ago

kmcdono2 commented 1 year ago

At the newspapers & AI LwM event I talked with Emily Bell about updating https://www.digitisednewspapers.net/maps/ with LwM datasets - those that will be openly available as downloads from XXX or those that can be requested from BNA/BL in the future. This would be a great way to expose the documentation to a wider audience and let people know what LwM has done to streamline metadata across the digitization projects of BL newspapers available from different providers.

griff-rees commented 1 year ago

Cool thanks, and yeah if that's a target set of users that would be great. Is this something they would want in this format: https://www.digitisednewspapers.net/maps/metadata-type/ ? A tad ironic to go back to METS, but hopefully easier to export to that than process initially.

kmcdono2 commented 1 year ago

The metadata type is about the original collection. I think what we should do is document the state of the collections that compose what we had access to on LwM, and what other people will have access to (e.g. JISC, FMP, HMD, LwM). Our versions were different, and if we want people to understand them in relation to other providers (e.g. Gale), this is really important to do. I'm happy to help with this, but also not the most well informed!

Also: doing this would be good for describing what is contained (newspaper metadata-wise) in the db. This is something that seems to be lacking in the documentation site right now.

griff-rees commented 1 year ago

Ah right so if I'm reading this correctly:

document the state of the collections that compose what we had access to on LwM, and what other people will have access to (e.g. JISC, FMP, HMD, LwM)

other people have had access to this data (including what's now labeled LwM?) but no to the level we have. Is that level specifically elements like:

A canonical list of all that would help. And yeah what I hoped to schedule was a group documentation session as I'm not expert on where most of this data came from, and how people have used it previously.

kmcdono2 commented 1 year ago

@griff-rees - you can see here the list of newspaper corpora that people have access to which are documented in the Atlas. Researchers have access to what we call JISC via Gale (what is called in the Atlas according to its Gale product name "British Library 19th Century Newspapers". But now people will also have access to HMD and LwM newspapers through the BL research repository datasets. Gale has done some work on the former JISC collection (I am not the best person to describe that - but there is some description here in the Atlas), and I don't know exactly what metadata Gale delivers with the collections, but the Atlas describes that so we can check: you can see different metadata categories here and how each collection delivers this metadata and also the "Metadata Schema" section in the Database Histories page for the BL collection.