FreeUKGen / FreeBMD2

For everything related to FreeBMD2. An updated version of the original FreeBMD genealogy website.
Apache License 2.0
1 stars 0 forks source link

Use multimodal base to use quasi-Permanent url for FreeBMD citations #279

Open PatReynolds opened 4 years ago

PatReynolds commented 4 years ago

-[ ] create citations

PatReynolds commented 3 years ago

Model citations made: https://docs.google.com/document/d/1Ab_Kws579BqkLW_GJZCSZZZC6VqiY0BNy-qGN3SgwdM/edit?usp=sharing

Because at different points the database includes different information, as for FreeCEN, for citations with detail, more can be provided: Detailed model citations- Wikitree https://docs.google.com/document/d/1AGNbwUR8TfSCek-1PtmYpEtgnVYjftKGcNsXxmsgn4c/edit?usp=sharing

richpomfret commented 3 years ago

Let's move this back into the backlog (or sprint) and see if this is something Ian Taylor might be able to assist with. @PatReynolds to talk to him.

PatReynolds commented 3 years ago

@PatReynolds to ask Ian to take a look

PatReynolds commented 3 years ago

@Pat to draft text for MVP launch (i.e. no guarantee that FreeBMD2 URLS will work).

richardofsussex commented 2 years ago

My suggestion is that the URL could contain human-friendly information, as has been done previously (experimentally, I guess - I don't see it in the URLs currently produced by the citation generator in CEN or REG). This additional information would follow '?' in the URL, i.e. it would be parameter data. Obvious data to include would be event type (B, M or D), Registration Office, volume, page, forenames and surname. If the base URL fails to resolve (because the record has changed, causing the generated UUID to change), a 404 handler would run a search for the specified register page. It would examine the list of entries returned (up to around 10?) and attempt to match the names to just one of them. If it succeeds, it will issue a 301 response, redirecting the request silently to the new URL for that record. If it fails, it will issue a 404 response.

DeniseColbert commented 2 years ago

Pat says: we have what we call the "prettified" url which is shown in the address bar, but the citation shows the unique identifier url to avoid cluttering the publication of the citation/reference.

PatReynoldsFUG commented 1 year ago

Decided to use unprittified quasi-permanent URLS in citations until permanent URLS are possible.

richardofsussex commented 1 year ago

I think we should use 'prettified' URLs for all reference/citation purposes until such time as we have genuinely persistent URLs for entry records - and that won't happen in the initial BMD2 release, nor for a while afterwards. My reason for this stance is simple: the 'pretty' bit of the URL includes sufficient information to reconstruct the search which found that record. The hash which is the key part of the core ('unprettified') URL works as an efficient and direct lookup while it is valid, but becomes neither use nor ornament if the underlying data changes and it becomes invalid.

Therefore the 'pretty' data gives us the possibility of a strategy to cope with the impermanence of our hashes. Without it, all we can do is shrug our shoulders and say "sorry: can't help!".

PatReynoldsFUG commented 1 year ago

Ignoring the permermanent url, prettified urls without the permanent id still contain less information than the citation

https://www.freereg.org.uk/search_records/5c3615e7f493fd70faf67f5c/grace-oldreth-richard-allcocke-marriage-yorkshire-north-riding-richmond-1657-05-19?citation_type=wikitree&locale=en

Yorkshire, North Riding : Richmond : Civil Marriages : Register of unspecified type : "Parish Register" database, FreeREG (https://www.freereg.org.uk/search_records/5c3615e7f493fd70faf67f5c : viewed 27 Oct 2022) marriage Richard Allcocke to Grace Oldreth 19 May 1657

The prettified url and the citation are independent: both can be served with an imermanent url (the hash) as an interim solution.

Other advantages of the citations:

richardofsussex commented 1 year ago

I think there are two quite distinct issues here. My concern is about defining a url which can act as a persistent identifier for a GRO index entry. This means that it can (in future) be used as a Linked Data identifier, resolving to JSON, RDF or another machine-processible format in response to a suitably framed HTTP request. I have already demonstrated how we can generate simple JSON and XML from our database records, reacting to the requested Content Type in the HTTP header.

A second aspect of my concern relates to the ability to recover from the situation where the hash value for a record changes. It is in that context that I am suggesting including the 'pretty' data in the url. It may not be as detailed as the information in a citation, but my hope is that it will be sufficient to populate the search boxes and so allow the user to re-run their search and hopefully track down the 'lost' record.

A citation cannot support either of these requirements.

DeniseColbert commented 1 year ago

Decision: it seems sensible to use prettified URLs in citations, with some text in help about why some citations may no longer work.

@richardofsussex to review Allan's newsletters on Groups.io (Syndicates list) for rate of corrections ie cause of broken urls.

richardofsussex commented 1 year ago

There are currently (well, June 2022) 11,884 outstanding corrections. Allan's figures show the change in the number of outstanding corrections from the last report, but that doesn't give a sense of the rate at which they are processed, just whether the backlog is getting bigger or smaller. The oldest outstanding corrections date from 2015.

DeniseColbert commented 1 year ago

Allan did a newsletter after the successful update in November, and one yesterday, I'll get links...

DeniseColbert commented 1 year ago

Strike that, perhaps a snip of the corresponding section in the most recent one will suffice?

image

richardofsussex commented 1 year ago

Yes, I saw that this morning, thanks. They have obviously been working hard on their corrections: but unfortunately this doesn't tell us the rate at which corrections occur.

DeniseColbert commented 1 year ago

OK, thinking this out from scratch as I've gone off piste, I think. Please correct any errors in my thought process: the url changes when a correction is made. We want to know the typical rate of corrections (e.g. number of corrections per month). We cannot rely on the data from June onward as typical, so we need to look at the rate in months previous to June. However, (and I think this is where I went off) we can't use the difference month-on-month because as corrections are made, new ones are being reported? I.e. it's not indicative of rate?

richardofsussex commented 1 year ago

Yes, your thought process is spot on. However, in a sense, knowing the correction rate isn't critical to our design decision, since we are stuck with the situation as you describe it. It would give us a sense of how often researchers are likely to hit this problem. The other unknown here is the extent to which researchers are actually picking up our URLs and including them in their records. Another variable factor is the length of time since the URL was picked up: older URLs are more likely to be 'stale'.

All of this confirms me in my view that (a) we should adopt URLs for FreeBMD2 which include enough information to recreate the search which found that record and (b) we should have as a long-term goal the creation of a stable FreeBMD database where records can have persistent URLs, and where corrections to records can simply be made and forgotten about.