bodleian / charters

1 stars 0 forks source link

short summaries include (duplicate) seal information #103

Closed holfordm closed 6 months ago

holfordm commented 1 year ago

e.g. https://charters-qa.bodleian.ox.ac.uk/?f%5Bms_collection_s%5D%5B%5D=D.+D.+Radcliffe&f%5Btype%5D%5B%5D=charter (nos. 486, 228 etc). This only seems to be happening with the summary entries (i.e. where msItem only contains place and date)

holfordm commented 1 year ago

This only seems to be happening in some collections though. At https://charters-qa.bodleian.ox.ac.uk/?f%5Bms_collection_s%5D%5B%5D=Norfolk&f%5Btype%5D%5B%5D=charter&page=9&per_page=200 there is no duplication.

Could be because the markup is different in the D.D.Radcliffe file with authDesc used as a child of msDesc and not appearing inside physDesc. If that's so the easiest fix will be for me to add an enclosing physDesc in this cases with a global find and replace.

Slange-Mhath commented 1 year ago

I think the problem here is that the regular expression, I used to build the Preview, takes everything between the h3 heading Contents and the hr3 heading Physical Description. So if D.D. Radcliffe does not have this <h3>Physical Description</h3> part I guess that might be causing it. Ill double check

Slange-Mhath commented 1 year ago

Just to confirm the data regarding the seals is structured a little bit different here the one where the issue occurs looks like this:

<div class=\"seal\"><span class=\"seal\"><b>Seal:</b>Ass's head on helmet above shield with chevron.</span></div>

While the one, which works as expected is structured like this:

<h3><?ni?>Physical Description<?ni?></h3> <div class=\"physDesc\"> <div class=\"seal\"><span class=\"seal\"><b>Seal:</b>(detached, kept separately) Armorial (Bellingham) (MS. Ch. Norfolk 771*) </span></div> </div>

As expected the heading, which indicates the start of the physical description is missing and the enclosing div which is responsible for potential styling of the physDesc (but might also be important for accessibility reasons). I could try and add another regex (again this whole preview thing is a little bit of a hacky solution) to exclude these duplicates but it would be quite tricky if the data is not consistent. From my point of view the better solution would be to have the data in a similar structure to have the consistency - also for other use cases (show page, screen readers etc.)

holfordm commented 1 year ago

should be fixed in the data now