FreeUKGen / FreeCENMigration

Issue tracking for project migrating FreeCEN to FreeCEN2 genealogy record database and search engine architecture. Code developed here is based on that developed in MyopicVicar
https://www.freecen.org.uk
Apache License 2.0
4 stars 3 forks source link

Hyphens in Scottish POBs #1805

Open geoffj-FUG opened 1 month ago

geoffj-FUG commented 1 month ago

I notice that there are a lot of Scottish POBs using hyphens (which is correct) but have ' added. Can we run a cleanup though the (whole) vld collection please to change the following entries to a straight hyphen. '-' '- -' (-) The fields affected are verbatim_birth_place and birth_place.

Almost 20% of the incorrect POBs in ABD are caused by this problem. No doubt it also exists in the other counties.

Geoff

AnneV-Learn commented 3 weeks ago

@geoffj-FUG The VLD POB validation Rake task currently updates VLD entry records (and linked Search records etc) with missing or UNK Birth place to a Hyphen, so I could update that code so that it handles '-', '-, -', (-) (and updates the Verbatim in those cases) too. We could then just re-run the VLD validation over Scottish counties that we have already done a First Pass run for and the others will happen automatically in the nightly runs. Are you happy with that approach?

geoffj-FUG commented 3 weeks ago

Anne

Yes, that sounds good. It is simple and will resolve the problem at the moment.

I wonder whether it will crop up again in other Scottish fields, but it should not affect searching as we develop it.

Geoff

From: Anne Vandervord @.> Sent: Thursday, 5 September 2024 3:47 AM To: FreeUKGen/FreeCENMigration @.> Cc: Geoff J @.>; Mention @.> Subject: Re: [FreeUKGen/FreeCENMigration] Hyphens in Scottish POBs (Issue #1805)

@geoffj-FUG https://github.com/geoffj-FUG The VLD POB validation Rake task currently updates VLD entry records (and linked Search records etc) with missing or UNK Birth place to a Hyphen, so I could update that code so that it handles '-', '-, -', (-) (and updates the Verbatim in those cases) too. We could then just re-run the VLD validation over Scottish counties that we have already done a First Pass run for and the others will happen automatically in the nightly runs. Are you happy with that approach?

— Reply to this email directly, view it on GitHub https://github.com/FreeUKGen/FreeCENMigration/issues/1805#issuecomment-2329649274 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AKCPIFN7OAO42J3ASWHCEELZU5BSDAVCNFSM6AAAAABMLPE4LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRZGY2DSMRXGQ . You are receiving this because you were mentioned. https://github.com/notifications/beacon/AKCPIFIXA4DSLFGU6PP3RHTZU5BSDA5CNFSM6AAAAABMLPE4LKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUK3OQHU.gif Message ID: @. @.> >

AnneV-Learn commented 1 week ago

@geoffj-FUG ready for testing in Test3

AnneV-Learn commented 1 week ago

@Geoffj-FUG FYI The same code is used for VLD POB Pre-validation and the Rake that runs overnight across the assigned County, so can I suggest you test the changes by downloading one or more VLD files that you know have the 'offending' values from the Production environment and then run Pre-validation on the files. You can check the records by using the 'List VLD Entries' option in POB Validation and also doing Searches to check the VLD record updates have been carried through to the Search records correctly.

geoffj-FUG commented 1 week ago

Anne

I have uploaded 2 Scottish 1851 pieces and pre-validated them. They both report no invalid POBs despite having hundreds of them in the live database. HS515168 shows errors in part - ["ABD/HS515168.VLD entry 105: Invalid birth county OVB (ED:1F Schedule:25 Record:0002, ADAM, Helen)", "ABD/HS515168.VLD entry 151: Invalid birth county OVB (ED:1F Schedule:37 Record:0004, TROUP, Alexander)", "ABD/HS515168.VLD entry 152: Invalid birth county OVB (ED:1F Schedule:38 Record:0001, TROUP, Margaret)", "ABD/HS515168.VLD entry 221: Invalid birth county OVB (ED:1F Schedule:58 Record:0001, PRIMROSE, Archibald)", "ABD/HS515168.VLD entry 222: Invalid birth county OVB (ED:1F Schedule:58 Record:0002, PRIMROSE, Grace)", "ABD/HS515168.VLD entry 571: Invalid birth county OVB (ED:1F Schedule:149 Record:0002, CARSON, Jean)", "ABD/HS515168.VLD entry 572: Invalid birth county OVB (ED:1F Schedule:149 Record:0003, CARSON, William)", "ABD/HS515168.VLD entry 640: Invalid birth county OVB (ED:1F Schedule:168 Record:0004, PROCTOR, John)", "ABD/HS515168.VLD entry 641: Invalid birth county OVB (ED:1F Schedule:168 Record:0005, PROCTOR, William)", "ABD/HS515168.VLD entry 642:

So there are invalid POBs in the piece but they are not showing in the count.

Geoff

AnneV-Learn commented 1 week ago

@geoffj-FUG Hmmm - no idea what happened there. I just downloafed the VLDs, then deleted them both and reloaded them. (There were load errors reported because some of the rows include OVB as birth county but that can be ignored as it is also reported as an invalid POB after Pre-validation is run.) I ran Pre-validation and hundreds of POB errors were reported as you expected - see the Manage VLD files info as it stands now.

geoffj-FUG commented 1 week ago

Anne I have loaded the pre-validated Scottish file as a csv OK. It seems to be passing all of the tests.

This can now be deloyed.

Geoff