FreeUKGen / FreeCENMigration

Issue tracking for project migrating FreeCEN to FreeCEN2 genealogy record database and search engine architecture. Code developed here is based on that developed in MyopicVicar
https://www.freecen.org.uk
Apache License 2.0
4 stars 3 forks source link

Resolve structure of Search Record Birth fields #1314

Open Captainkirkdawson opened 3 years ago

Captainkirkdawson commented 3 years ago

The Search Record collection which is used to support the Search currently contains 2 fields related to place of birth. field :birth_chapman_code, type: String field :birth_place, type: String They are populated from fields in the vld entry or the csv entry. Each entry has 4 fields; 2 from the verbatim entries by the transcriber and sometimes 2 alternate entries by the validator. The search record has only 2 fields and is filled from either the vld entry or the csv entry. The logic used to populate the search from from vld's has changed over time and is different from that used for csv's. The logic used from vld's from 2017 until 2021 was simple. It always used the verbatim chapman code it here was one and ONLY used the suggested chapman code if there was NO verbatim birth code. This was changed in late 2020 to be the same as the CSVProc. where the Suggested POB fields are used if present. However the search record collection was not rebuilt. The basic question is should we include both the Verbatim and Suggested Birth places in the Search Record and search against either of them rather than just the inconsistent Birth Place that is currently present.

PatReynolds commented 3 years ago

For clarification - we are talking here about 'place of birth or 'place of birth' and county of birth'?

geoffj-FUG commented 3 years ago

Pat

Both. They are always a set and alternatives can be caused by incorrect County or oncorrect Place issues.

PatReynolds commented 3 years ago

Thanks Geoff.
I think the current situation is: a) County of Birth is transcribed in Chapman Code Format (one field), from which a county name is automatically assigned to a second field. There is third field where a 'suggested county name' is entered, or a 'suggested chapman code' is entered from which a suggested county name is automatically assigned to a fourth field). b) Place of Birth is transcribed as TWYS, and a Gazzetter-verified entry for the place-name in that county is sometimes added for clarification FC2) or an unverified entry was sometimes added for clarification (FC1).

Is the assignment of a 'suggested county' and 'G-v pob' linked and does one have to be done first?

rhodamackenzie commented 3 years ago

I'd say either "suggested PoB" with a roll-over explanation of that that is. or a notes column with suggested Dob

Captainkirkdawson commented 3 years ago

@AlOneill @geoffj-FUG @PatReynolds @rhodamackenzie @AnneV-Learn @DeniseColbert @FreecenBren Please note that the description of this story has been updated to explain more clearly the issue and what is being asked. It was poorly written and incomplete. My apologies. Please review and answer the following. Basic question is should we include both the Verbatim and Suggested Birth Places in the Search Record and search against either of them: or just populate the search record with one or the other Birth Place as at present but following a clear decision on which to use. In either case we should update the search record collection to be consistent and support our decision.

geoffj-FUG commented 3 years ago

Kirk

If there is an alternative POB it means that at the time of validation there was no entry in the Gazetteer for that place but there was an entry that probably reflects the true place of birth. However, we do not know whether a POB has been added to the Gazetteer since the piece was validated (yes that can happen).

So given the above my preferred answer is both. What is the additional load on the search caused by this? It will only capture any places that have been added to the Gazetteer post validation.

My second answer is if the Alternative POB is populated then search on that otherwise search on the POB if the server load is significant in including both.

Geoff

Captainkirkdawson commented 3 years ago

@geoffj-FUG The vast majority of our records never had the benefit of the Gazetteer. Before implementation I intend to conduct some research into the contents of the POB fields in the Individual collection to see what the relative usage is. The load in terms of searches would be incremental on the load created by introducing a search on the POB place field.

AlOneill commented 3 years ago

Using the example that started this discussion, if you believed that your target person was born in Liverpool, you would not search in Lincolnshire — unless you were a great lateral thinker and had an idea that the abbreviations Lincs and Lancs were easily confused! So, searching the Suggested County seems essential.

rhodamackenzie commented 3 years ago

I would keep it simple - when a search is made, both verbatim and suggested are searched so either can appear in the search results, with verbatim showing as the default in the results list and ("or suggested..." under it (as per @AlOneill suggestion in another thread)

rhodamackenzie commented 3 years ago

Also, very few users will know why a suggested PoB might show up as they have no idea that transcribers/validators exist or what they do, not sure if that needs to be explained anywhere

Captainkirkdawson commented 3 years ago

@PatReynolds has confirmed that she also supports indexing and searching on both versions of the POB. She has also suggested that the Gazetteer variations of a place name should also be searched.

geoffj-FUG commented 2 years ago

Both fields should be searched for. An entry may have been added to the Gazetteer since the piece was validated. If neither entry is in the Gazetteer then the County should be searched. (Consider OVF to be a County). Our system looks for the POB in the Gazetteer, whether it is a primary entry or a secondary entry. The variations are therefore identified. The Latitude and Longitude is then collected and it is entries in places around that Latitude and Longitude that are identified in the search results. There may be more than two sets of lat/long to look for. Some place names recur within a County and therefore each of them needs searching as which of the places is not necessarily identified. Geoff

PatReynoldsFUG commented 2 years ago

Structure now resolved, so moving to backlog