FreeUKGen / FreeCENMigration

Issue tracking for project migrating FreeCEN to FreeCEN2 genealogy record database and search engine architecture. Code developed here is based on that developed in MyopicVicar
https://www.freecen.org.uk
Apache License 2.0
4 stars 3 forks source link

192611424 Duplicate Alternate Place Names (Geoff) #1136

Closed ghost closed 2 months ago

ghost commented 3 years ago

Issue reported by somt.cen at 2021-02-07 07:10:24 UTC Time: 2021-02-07T07:07:54+00:00 Session ID: 56a5cb1e7850b8b21f5e180ab907638f Problem Page URL: /freecen2_places/search_names_results?locale=en Previous Page URL: https://www.freecen.org.uk/freecen2_places/search_names?locale=en Reported Issue: Kirk I think that when I tried to add Antony to Cornwall (there are three) I was blocked because it already existed. However I have just added Anstey to both East and West Anstey in Devon. It was done in the Other Place Name field. Is there no test on this field? Or has my memory on the Antony addition failed me? Geoff

Screenshot

Captainkirkdawson commented 3 years ago

There is a test for name uniqueness on the main field but not on the additional names. Should there be?

Captainkirkdawson commented 3 years ago

Kirk

It seems strange to me that there is not. The ‘Other Place Name’ is supposed to be a pseudonym. It means that a Place Name can be entered more than once in the Database.

Where a place genuinely appears more than once then the bracket solution should be used to differentiate between them. I did this when linking Cornwall to sort out a few places and make sure that I could identify the correct link.

I know I complained that FreeREG had put locations in brackets e.g. SOM Wraxall (Nailsea) but it made linking so much easier. FreeCEN still accepts Wraxall at validation but when looking at the results of a search the result is so much clearer. We now have a solution that works well both ways.

Geoff

Captainkirkdawson commented 3 years ago

Do I interpret this as something that should be done. i.e. the alternates need to be unique.?

(Likely never really crossed my mind it was such a rarely used feature within FreeREG)

Captainkirkdawson commented 3 years ago

Kirk I really need to think about the consequences for the system, because Counties do have repeated names in them. So we cant enforce data integrity. That creates complications for Validation Geoff

Captainkirkdawson commented 3 years ago

But i thought you said Where a place genuinely appears more than once then the bracket solution should be used to differentiate between them

So the uniqueness is on the full text "Wraxall (Nailsea)": not the parts.

With that we can have data integrity i.e. only 1 Wraxall (Nailsea)

DeniseColbert commented 3 years ago

Any further comments @geoffj-FUG

AnneV-Learn commented 2 years ago

@geoffj-FUG can I just double-check the requirements here. So an alternate name should not be the same as any place name in the same county or other alternate name in the same county - is that correct?

geoffj-FUG commented 2 years ago

Anne

Yes that is correct.

We need to restrict this to new additions only. If we try to enforce data integrity on existing data then we will have problems.

We can differentiate between places in the same county by adding their nearest significant town in brackets. You will find lots of these instances in the Gazetteer. So, the test should simply be one of matching the string of the place name within the county. If the string is unique it passes, otherwise it does not.

For example, Wraxall (Nailsea) in SOM is differentiated from Wraxall (Shepton Mallet) in this way. The strings do not match even though there are two Wraxalls.

FYI - When the piece is validated it only matches Wraxall, the brackets are ignored. The lack of brackets will need to be addressed when and if POB searching is developed. There will be multiple coordinates in some cases.

Geoff

AnneV-Learn commented 2 years ago

Reply from Geoff:

Anne

Yes that is correct.

We need to restrict this to new additions only. If we try to enforce data integrity on existing data then we will have problems.

We can differentiate between places in the same county by adding their nearest significant town in brackets. You will find lots of these instances in the Gazetteer. So, the test should simply be one of matching the string of the place name within the county. If the string is unique it passes, otherwise it does not.

For example, Wraxall (Nailsea) in SOM is differentiated from Wraxall (Shepton Mallet) in this way. The strings do not match even though there are two Wraxalls.

FYI - When the piece is validated it only matches Wraxall, the brackets are ignored. The lack of brackets will need to be addressed when and if POB searching is developed. There will be multiple coordinates in some cases.

Geoff

AnneV-Learn commented 2 years ago

@geoffj-FUG Updated code that should solve this has been deployed to Test3 - are you able to test it?

geoffj-FUG commented 2 years ago

Anne

I have just added Backwell Hill as an alternative to Backwell in SOM. (Correct because it did not exist)

Then I created a new Place Name of Backwell Hill as the primary Place Name. It was accepted, even though it already exists as an alternative place name. (Incorrect)

I then created a new place name of Backwell Hill as a Primary Place Name and it was rejected. (Correct)

Geoff

AnneV-Learn commented 2 years ago

Was this on Test3 Geoff? A

On 5 Mar 2022, at 11:50, geoffj-FUG @.***> wrote:



Anne

I have just added Backwell Hill as an alternative to Backwell in SOM. (Correct because it did not exist)

Then I created a new Place Name of Backwell Hill as the primary Place Name. It was accepted, even though it already exists as an alternative place name. (Incorrect)

I then created a new place name of Backwell Hill as a Primary Place Name and it was rejected. (Correct)

Geoff

From: Anne Vandervord @.> Sent: Friday, 4 March 2022 4:02 AM To: FreeUKGen/FreeCENMigration @.> Cc: geoffj-FUG @.>; Mention @.> Subject: Re: [FreeUKGen/FreeCENMigration] 192611424 Duplicate Alternate Place Names (Geoff) (#1136)

@ geoffj-FUG Updated code that should solve this has been deployed to Test3 - are you able to test it?

— Reply to this email directly, view it on GitHub https://github.com/FreeUKGen/FreeCENMigration/issues/1136#issuecomment-1058331064 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AKCPIFKUK5J6DJ5VCX2MQDDU6D5CPANCNFSM4XNHIJZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . You are receiving this because you were mentioned. https://github.com/notifications/beacon/AKCPIFPE6OKPIKQBX3WDNTLU6D5CPA5CNFSM4XNHIJZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOH4KNTOA.gif Message ID: @. @.> >

— Reply to this email directly, view it on GitHubhttps://github.com/FreeUKGen/FreeCENMigration/issues/1136#issuecomment-1059749426, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARLANZYLXECT5QNTDZCXX5DU6NDB5ANCNFSM4XNHIJZQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were assigned.Message ID: @.***>

AnneV-Learn commented 2 years ago

 G I actually didn’t change the code relating to checking the Primary place name when created as I hadn’t realised there was a problem with that! I was just making amendments to the alternative name processing when adding a new alternative or crating a new place with alternatives. I’ll look at correcting the issue you found too. We’re you able to test the way it now validated alternative names (on Test3)? I’ll let you know when you can retest - will be in a few days. Thanks for the testing you have done so far Geoff. A

geoffj-FUG commented 2 years ago

Anne

I will look at Validation. I was just checking the Gazetteer.

Just sanity checking – we are only using strings to test duplicate entries. It occurred to me that if you are using Coordinates then that would cause my results.

That raises the thought that perhaps we should be using County + String + Coordinates to test for duplicates. That would remove the issue where 2 places are named the same in the same County. There are a lot of them. We have to keep County in the test as some parishes crossed County lines and would therefore have the same Coordinates.

Just a thought. (You will find that I will bounce my thoughts off of you. Kirk got used to it.)

Geoff

AnneV-Learn commented 2 years ago

Hi Geoff,

Sorry I used misleading terminology there. I was only referring to the Gazetteer. I meant ‘validation’ that the system does when an alternative name is added to an existing Gazetteer place or when a new Gazetteer place is created with alternative names specified at creation (checking for duplicates). Those are the areas of code that I had attempted to update. Currently it uses County and String - coordinates are not included in the matching. My initial thoughts are that that should be sufficient. If coordinates were used it could get over ‘picky‘ maybe if they were slightly inaccurate but I am still a novice really as to the thoughts that have gone into the design of this system.

Anne

geoffj-FUG commented 2 years ago

Anne

I am happy with the County / Place string. No problem.

I understand. Unfortunately FreeCEN uses Validation as a particular process. All good.

Geoff

AnneV-Learn commented 2 years ago

Hi Geoff,

Program code has been updated which I believe will resolve this issue. The updated code has been deployed to Test3. So could you please re-test when you get time (I know you have loads on at the moment)? And complete any further testing of the duplicate alternative names issue (#1136).

Bye for now, Anne

On 5 Mar 2022, at 11:50, geoffj-FUG @.***> wrote:



Anne

DeniseColbert commented 2 years ago

@geoffj-FUG can you test?

PatReynolds commented 2 years ago

@geoffj-FUG - have you been able to test?

geoffj-FUG commented 2 years ago

Anne

Not quite there yet

If I add an alternative Place name that is already exists the addition fails (Correct)

I added Backwell Churchtown as an alternative in SOM. OK as it did not exist.

Then I created it as a new place name. It was created OK even though the alternative existed. It should have failed

Geoff

From: PatReynolds @.> Sent: Thursday, April 7, 2022 1:36 AM To: FreeUKGen/FreeCENMigration @.> Cc: geoffj-FUG @.>; Mention @.> Subject: Re: [FreeUKGen/FreeCENMigration] 192611424 Duplicate Alternate Place Names (Geoff) (#1136)

@geoffj-FUG https://github.com/geoffj-FUG - have you been able to test?

— Reply to this email directly, view it on GitHub https://github.com/FreeUKGen/FreeCENMigration/issues/1136#issuecomment-1090412855 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AKCPIFOJNUSVIWWBUGJEGGTVDWVOBANCNFSM4XNHIJZQ . You are receiving this because you were mentioned.Message ID: @.***>

AnneV-Learn commented 2 years ago

Ok thanks @geoffj-FUG I’ll take another look when I’m back next week

AnneV-Learn commented 2 years ago

Ok @geoffj-FUG - I found that the most recent code changes had not been fully deployed to Test3. That has been resolved now, so please re-test. FYI I just did a similar test on Test3 and got the correct result I.e 6DF1A75D-1323-4CB5-A5AC-22869AD9ABF3.png

AnneV-Learn commented 2 years ago

@geoffj-FUG - have you had a chance to re-test this on Test3?

geoffj-FUG commented 2 years ago

Anne

Looks good and all works well now. This can be migrated to production.

Geoff

geoffj-FUG commented 2 years ago

Anne We still have an issue. It appears that when we look for duplicates it is not testing the whole string and nothing but the string. In Norfolk there are two parishes called 'Forncett St Mary' and 'Forncett St Peter'. I tried to add two alternative place names - 'Forncett End' and 'Forncett' against Forncett St Mary. The entry of just 'Forncett' was rejected even though it does not exist as a distinct string (only as a sub-string). It appears that 'Forncett End' would have been accepted if I had continued. The Gazetteer should have allowed just 'Forncett' to be entered as that distinct string does not exist. I had suspected that there was a problem when I had issues wiith entries with brackets after them. However up to now I had not put my finger on it. Geoff

PatReynolds commented 2 years ago

@AnneV-Learn could you take a look at this? Moving back into in progress. Many thanks, Pat

AnneV-Learn commented 2 years ago

@geoffj-FUG the reason that Forncett was rejected is because there is a disabled place named Forncett. Not sure how you want to proceed - I could remove the disabled place (behind the scenes in the Mongo database). But wondering why was it created and then disabled in the first place? It appears to have been disabled on 01-03-2022.

geoffj-FUG commented 2 years ago

It was disabled because I was moving it to the corect place as an alternative name. (Housekeeping). Geoff

geoffj-FUG commented 2 months ago

This now appears to be working correctly.

Geoff