britishlibrary / Incunabula-Catalogue-Entry-Detection

Ongoing BL development of code produced in 2022/23 by Isaac Dunford as part of a Digital Humanities Internship funded by the School of Humanities at the University of Southampton.
0 stars 1 forks source link

First line of entries is not always capitalised #5

Open harrylloyd-bl opened 8 months ago

harrylloyd-bl commented 8 months ago

Current behaviour: Some entries start with the final lines of the previous entry e.g. "Bought in April, 1866."

Expected behaviour: The first part of an entry in the catalogue is the title, and the first part of this is always capitalised. All entries should therefore start with a capitalised section. This should be done through the application of a regex that identifies specifically capital letters

harrylloyd-bl commented 5 months ago

Partially fixed by improving shelfmark detection in dc8831665df7e03a6b6bba33cafcb95761cedbd7 & 7973758d47fcbe8a55de8b66ef5743ad0789f4df. Also added logic to recognise "Bought in" specifically, and reorder lines as a result 8d9e856f538daaf83f1474516f334a921f5f8457