FreeUKGen / FreeBMD2

For everything related to FreeBMD2. An updated version of the original FreeBMD genealogy website.
Apache License 2.0
1 stars 0 forks source link

Investigate flagging of unsure transcriptions #519

Open DeniseColbert opened 1 year ago

DeniseColbert commented 1 year ago

Compare FreeBMD1

richardofsussex commented 1 year ago

BMD1 uses bold face to indicate that there are at least two identical transcriptions of an entry. Non-bold text indicates a single transcription. Italic text for the District indicates either that the District name is an alternative or mis-spelled version of the name, or that the volume and/or page number does not match the expected values for that District at that time. Examples:

image Here the two transcriptions have different page numbers, and one of those numbers is out of range for that District in Sep 1837. This presentation makes it pretty clear that the first entry is more likely to be the correct one.

image Here the District name isn't an approved or alternate form of name. The page which is the target of the link makes this plain: image Our current display makes none of these distinctions, and in fact hides the volume/page information so that you have to look at the detail to even begin to guess which might be correct: image

AlOneill commented 1 year ago

Please remember that typographical differences such as bold or italic text are inaccessible to a screen reader user. We could use them as a starting point but we will need something else PDQ.

richardofsussex commented 1 year ago

Fair comment. Is it possible to use the title attribute for this purpose? It would in my view be an improvement on BMD1 if users didn't have to follow a link to find out what the issue is with the District. The title could begin "Warning: " where there is an issue with the District or volume/page.

AlOneill commented 1 year ago

No, 'fraid not! Title attributes are inaccessible to those who cannot or do not use a pointing device. Forget the misinformation that hails the title attribute as a boon to accessibility — it is a pernicious myth.

richardofsussex commented 1 year ago

OK: you'll have to educate me as to how we deal with such situations.

Another point about the BMD1 design is that the District link goes to an intermediate page, specific to that entry, from which there are onwards links both to the UKBMD site and to the BMD1 information about that District. The BMD2 link goes straight to UKBMD and offers no way of accessing the BMD information about the District.

richardofsussex commented 1 year ago

There is room in the Entry Information page to include the summary information about the District which currently sits on the separate intermediate page: image This would allow us to add warnings as text, and would mean that we can keep the current links to UKBMD as they are.

richardofsussex commented 1 year ago

However, see #520 where I come to a different conclusion (!)

richardofsussex commented 1 year ago

The Districts table tells us the start and end quarters, so we can check that it did exist in the quarter specified in the entry. It also records the volume number for each major span of years (1837-51; 1852-1945; etc.) so that can be checked from that table. We can also check whether the authorized name for the District was used, or a synonym. (This may require use of the DistrictSynonyms and/or DistrictPseudonyms tables: the core Districts table also includes 'invented' names.)

In order to check whether the page number is likely to be correct (the actual issue in this example), we need a new source of information. As far as I can tell from the BMD1 source code, the expected page ranges for a particular district, year quarter and event type are deduced from a suitably-named CSV file.

richardofsussex commented 1 year ago

The bestguess table has a "Volume" index, on the combination of Volume, Page and QuarterNumber: image We could check how many entries there are for the combination of volume, page and quarternumber in an entry, and see if it corresponds to the number of entries you might expect in a valid page. If there is only one entry (i.e. the current one), it is almost certain to be an outlier, and can be flagged as such.

richardofsussex commented 1 year ago

In general, I think that we should be more specific in our comments on doubtful data. For example, the doubt about the second Eliza Greenwood entry above relates to the page number, yet it is the District name which is italicized. There is room in the Entry Information page to do this, and it will make the page more accessible - arguably the uncertainty information on the BMD1 pages isn't accessible at all, from what @AlOneill says - in which case whatever we achieve will be a major improvement.

DeniseColbert commented 1 year ago

Another for the steering group