Open kmcdono2 opened 1 year ago
Tricky, and nice find... I think a deep-dive from the source mets/alto is a good starting point. Can you find an occurrence from HMD or LwM papers (i.e. public) and point us to the files that came from our partners, see how its been handled there
@mcollardanuy I think you had an example of this from one of the collections? Could you share here?
Hi @kmcdono2, no, I don't have an example: it was just an observation that we thought it was worth investigating at some point.
So I think we need to understand whether this is really a problem (or could it be that morning and evening editions had different newspaper codes, for example?), and, if it is, whether this comes from the original data or from us, and how this is handled in the DB (i.e. are there duplicate item codes in the DB or were they removed?).
Right then, @griff-rees has some ideas on how to test that hypothesis
My approach is two fold:
edition_name
: an optional string
(morning
, evening
etc.)edition_order
: a required PositiveSmallIntegerField
, likely validated via django_int defaulting to 0.QuickcComments:
which we don't currently have an example of
@griff-rees - do we have an example of this? You give us a potential solution, but it's unclear if it's actually a problem we are seeing
@griff-rees can we look for any publication_code-issue_code-item_codes
that are not unique? Is there a query to do that easily? We just need to know if this exists at all.
I noticed this on Slack and thought I'd chime in - FMP don't digitise more than one edition per day, as it's just not worth it for them. Newspaper scholars would prefer that they did, of course, but I can see why they don't.
If you do have any examples of multiple digitised editions for the same day, I can ask about how they're distinguished in the BNA / BL catalogue (also how they came to exist).
Summary
Problem: existing combination of
publication_code
-issue_code
-item_code
is NOT unique.Why?
issue_code
is based on date, e.g. 18881204 (Dec 4, 1888).But, there can (sometimes) be multiple editions on the same day.
Currently there is no edition field in the newspaper db, which would solve this problem.
Solution: Add
edition_code
to lwmdb at issue level.Then, adding this to
publication_code
,issue_code
, anditem_code
would ensure that we have human-understandable unique ids for all items.Not important to order
edition_code
at this stage, as it's both infrequent and there are a limited number of editions (1-3 max?).Actions
edition_code
in issue tableedition_code
to create unique ids for items in samples going forwardRelated Issues and Pull Requests
-
Updates