Living-with-machines / lwmdb

A django-based library for managing the Living with Machines newspapers metadata database schema
https://living-with-machines.github.io/lwmdb/
MIT License
2 stars 0 forks source link

add edition field to lwmdb #120

Open kmcdono2 opened 1 year ago

kmcdono2 commented 1 year ago

Summary

Problem: existing combination of publication_code-issue_code-item_code is NOT unique.

Why? issue_code is based on date, e.g. 18881204 (Dec 4, 1888).

But, there can (sometimes) be multiple editions on the same day.

Currently there is no edition field in the newspaper db, which would solve this problem.

Solution: Add edition_code to lwmdb at issue level.

Then, adding this to publication_code, issue_code, and item_code would ensure that we have human-understandable unique ids for all items.

Not important to order edition_code at this stage, as it's both infrequent and there are a limited number of editions (1-3 max?).

Actions

Related Issues and Pull Requests

-

Updates

DavidBeavan commented 1 year ago

Tricky, and nice find... I think a deep-dive from the source mets/alto is a good starting point. Can you find an occurrence from HMD or LwM papers (i.e. public) and point us to the files that came from our partners, see how its been handled there

kmcdono2 commented 1 year ago

@mcollardanuy I think you had an example of this from one of the collections? Could you share here?

mcollardanuy commented 1 year ago

Hi @kmcdono2, no, I don't have an example: it was just an observation that we thought it was worth investigating at some point.

So I think we need to understand whether this is really a problem (or could it be that morning and evening editions had different newspaper codes, for example?), and, if it is, whether this comes from the original data or from us, and how this is handled in the DB (i.e. are there duplicate item codes in the DB or were they removed?).

DavidBeavan commented 1 year ago

Right then, @griff-rees has some ideas on how to test that hypothesis

griff-rees commented 1 year ago

My approach is two fold:

kmcdono2 commented 1 year ago

QuickcComments:

DavidBeavan commented 1 year ago

which we don't currently have an example of

@griff-rees - do we have an example of this? You give us a potential solution, but it's unclear if it's actually a problem we are seeing

griff-rees commented 1 year ago
kmcdono2 commented 1 year ago

@griff-rees can we look for any publication_code-issue_code-item_codes that are not unique? Is there a query to do that easily? We just need to know if this exists at all.

mialondon commented 11 months ago

I noticed this on Slack and thought I'd chime in - FMP don't digitise more than one edition per day, as it's just not worth it for them. Newspaper scholars would prefer that they did, of course, but I can see why they don't.

mialondon commented 11 months ago

If you do have any examples of multiple digitised editions for the same day, I can ask about how they're distinguished in the BNA / BL catalogue (also how they came to exist).