FreeUKGen / FreeCENMigration

Issue tracking for project migrating FreeCEN to FreeCEN2 genealogy record database and search engine architecture. Code developed here is based on that developed in MyopicVicar
https://www.freecen.org.uk
Apache License 2.0
4 stars 3 forks source link

Run rake task to check for VLD deletions #1269

Open PatReynolds opened 2 years ago

PatReynolds commented 2 years ago

There was a story #1260 that addressed the issue of vld file deletions. This identified that coordinators were equating a freecen1 piece parm with a vld file. This was resolved. However it is possible that some freecen1 parms were deleted but their associated vld file and its entries and search records were left behind on the system as orphans. First step is to determine if the are orphan vld files in production. The rake task should walk through the vld files in the freecen1_vld_files collection and ensure that its associated links to freecen_pieces and freecen2_pieces actually exits. Likely best done by chapman_code. Provide a list of those were the link is not actually there.

geoffj-FUG commented 2 years ago

What is this story about please? Geoff

AnneV-Learn commented 2 years ago

I’m not sure @geoffj-FUG - I’ll have to discuss/check at the next scrum meeting

PatReynolds commented 2 years ago

It seems to be to pick up any issues arising from when the Delete did not function as intended (story #1260, which is complete).

Captainkirkdawson commented 2 years ago

description written

AnneV-Learn commented 2 years ago

Thanks @Captainkirkdawson

AnneV-Learn commented 2 years ago

Rake report has been written and tested on Test3 (but there were no VLD deletions there it would seem). Have asked @Vino-S to deploy to Production, run and send output to me to review.

geoffj-FUG commented 2 years ago

Anne

A live one? Row 200 and a couple more in RG107_1924 SOM.

I cant upload it to test3 as I don’t have the original any more. It was on an old PC (Vista!). But you can probably access the vld file from the FC1 upload area.

I have downloaded the piece from FC2and am working on a correction for it so it will eventually reappear on FC2 as a csv file. But my download is converted to a new format so it is not any use as an upload. But I do have warnings on row 200, 282, 292 and 505. I have not checked for others.

Started on the file HO107_1924.csv for somt.cen at 2022-03-31 07:38:10 +0100. Working on Stowey for 1851, in SOM. Warning: line 200 Notes contains information Deleted flag set on VLD; POB not locatedPOB not located. Warning: line 203 New Folio number increment larger than 1 14. Warning: line 282 Notes contains information Deleted flag set on VLD;. Warning: line 292 Notes contains information Deleted flag set on VLD; POB - Madehurst SSX or Midhurst SSX?POB - Madehurst SSX or Midhurst SSX?. Warning: line 505 Notes contains information Deleted flag set on VLD;.

I will re-validate this one and give you feedback on the Deleted flag comments.

Geoff

AnneV-Learn commented 2 years ago

Thanks for the info @geoffj-FUG. Isn't you comment related to 'Research how one might respect the soft delete in vld files' #1311 rather than this story - or am I misunderstanding?

geoffj-FUG commented 2 years ago

Anne

There are a lot of deleted flags in this piece. I have validated about 20 of them and none are deleted.

I have managed to find a spreadsheet containing the original transcription. There is no reason that some of the entries would even need looking at during validation.

So is this some sort of historical leftover from FC1 where these flags existed? Th original hypothesis was that the Ds were the control flags from valdrev. But there is no obvious trigger for these flags to occur.

I am afraid that the design of FC1 was well before my involvement. It may be that they are quite harmless. I can find no obvious cause for it in the entries that I am looking at.

So back to my query, what other flags are in this field please?

Geoff

Captainkirkdawson commented 2 years ago

@AnneV-Learn @geoffj-FUG Anne you are correct that this story has nothing to to with the soft delete in story #1311. This story checks to see if there are orphan vld files because people deleted the freecen1 parm on freecen2 @geoffj-FUG we are waiting for you to test story #1324 "the latest version of the vld download as CSVProc works as expected. Deployed to test3 for you to test" this addresses the soft delete by including them in the vld download as CSVProc

AnneV-Learn commented 2 years ago

Deployed to production and rake task run. Output is attached - there are no orphan vld files in production. Can be closed I think.

https://app.zenhub.com/files/28748917/67a7401d-f463-4108-bf51-b63c3f17afba/download

Captainkirkdawson commented 2 years ago

That is great news

geoffj-FUG commented 1 year ago

Anne

I have just had a look at all of these entries with deleted flags. None of them should be deleted. They are all valid entries.

I have a feeling that we are going to have out hypothesis about the D flag disproven. What other flags are in that column of the data please? That may tell us whether the data is related to the validation trail in valdrev.

Geoff

From: @. @.> Sent: Thursday, 31 March 2022 5:06 PM To: 'FreeUKGen/FreeCENMigration' @.>; 'FreeUKGen/FreeCENMigration' @.> Cc: 'Mention' @.***> Subject: RE: [FreeUKGen/FreeCENMigration] Run rake task to check for VLD deletions (#1269)

Anne

A live one? Row 200 and a couple more in RG107_1924 SOM.

I cant upload it to test3 as I don’t have the original any more. It was on an old PC (Vista!). But you can probably access the vld file from the FC1 upload area.

I have downloaded the piece from FC2and am working on a correction for it so it will eventually reappear on FC2 as a csv file. But my download is converted to a new format so it is not any use as an upload. But I do have warnings on row 200, 282, 292 and 505. I have not checked for others.

Started on the file HO107_1924.csv for somt.cen at 2022-03-31 07:38:10 +0100. Working on Stowey for 1851, in SOM. Warning: line 200 Notes contains information Deleted flag set on VLD; POB not locatedPOB not located. Warning: line 203 New Folio number increment larger than 1 14. Warning: line 282 Notes contains information Deleted flag set on VLD;. Warning: line 292 Notes contains information Deleted flag set on VLD; POB - Madehurst SSX or Midhurst SSX?POB - Madehurst SSX or Midhurst SSX?. Warning: line 505 Notes contains information Deleted flag set on VLD;.

I will re-validate this one and give you feedback on the Deleted flag comments.

Geoff

From: Anne Vandervord @. @.> > Sent: Wednesday, 30 March 2022 7:32 PM To: FreeUKGen/FreeCENMigration @. @.> > Cc: geoffj-FUG @. @.> >; Mention @. @.> > Subject: Re: [FreeUKGen/FreeCENMigration] Run rake task to check for VLD deletions (#1269)

Rake report has been written and tested on Test3 (but there were no VLD deletions there it would seem). Have asked @Vino-S https://github.com/Vino-S to deploy to Production, run and send output to me to review.

— Reply to this email directly, https://github.com/FreeUKGen/FreeCENMigration/issues/1269#issuecomment-1082842234 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AKCPIFJTK4XFKL3B52CDL73VCQNQHANCNFSM5BET5LHA unsubscribe. You are receiving this because you were mentioned. https://github.com/notifications/beacon/AKCPIFOO7K6F4XLUTELAE4DVCQNQHA5CNFSM5BET5LHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOICFNY6Q.gif Message ID: < @.> @.>

AnneV-Learn commented 1 year ago

Hi Geoff,

As far as I can see the position (0 ie first character in the record) of the Deleted flag/marker in a VLD file is only used for that purpose. (I have checked through the code and can see no other use of the flag/marker at that position.)

FYI comments documented in the VLD file parser are as follows:

# A       0  1 Deletion marker (D or blank)
# B       1  4 Not used. Was Registration district - this usage is discontinued.
# C       5  6 A six digit number (leading zeros) which counts the households
# D      11  4 A four digit number (leading zeros) which counts members in eac household
# E   A  15 20 Parish name (I don't check this, it is up to you to get it right!)
# F   B  35  4 Enumeration district (3n+1a, the remaining numeric fields have trailing spaces)
# G   C  39  5 Folio number (4n+1a)
# H   D  44  4 Page number (4n)
# I   E  48  4 Schedule number (3n+1a)
# J   F  52  5 House number (4n+1a)
# K   G  57 30 House/Street name (default -)
# L   H  87  1 Uninhabited flag (b, u, v, n, x or -)
# M   I  88 24 Surname (capitals, default -)
# N   J 112 24 Forenames (default -)
# O   K 136  1 Flag for name fields (x or -)
# P   L 137  6 Relationship (default -)
# Q   M 143  1 Condition (M, S, U, W or -)
# R   N 144  1 Sex (M, F or -)
# S   O 145  3 Age (no default but 999=unknown/unreadable)
#          148  1 Age unit(y, m, w, d or -)
# T   P 149  1 Flag for detail fields i.e. rel/cond/sex/age (x or -)
# U   Q 150 30 Occupation
#       R        Employed Status (extracted from occupation field)
# V   S 180  1 Flag for occupation (x or -)
# W   T 181  3 Transcriber County code (3a capitals, no default but UNK if not known)
# X   U 184 20 Transcriber Birth place (default -)
# Y   V 204  1 Flag for birth place (x or -)
# Z   W 205  6 Disability (default blank)
# AA  X 211  1 Language (W, E, B, G or blank)
# AB  Y 212 44 Notes (default blank, no case
#     Z 276  3 Alternate birth county
#     AA 279 20 Alternate birth place

KR Anne

On 11 Oct 2022, at 08:54, geoffj-FUG @.**@.>> wrote:

Anne

I have just had a look at all of these entries with deleted flags. None of them should be deleted. They are all valid entries.

I have a feeling that we are going to have out hypothesis about the D flag disproven. What other flags are in that column of the data please? That may tell us whether the data is related to the validation trail in valdrev.

Geoff

From: @. @.> Sent: Thursday, 31 March 2022 5:06 PM To: 'FreeUKGen/FreeCENMigration' @.>; 'FreeUKGen/FreeCENMigration' @.> Cc: 'Mention' @.***> Subject: RE: [FreeUKGen/FreeCENMigration] Run rake task to check for VLD deletions (#1269)

Anne

A live one? Row 200 and a couple more in RG107_1924 SOM.

I cant upload it to test3 as I don’t have the original any more. It was on an old PC (Vista!). But you can probably access the vld file from the FC1 upload area.

I have downloaded the piece from FC2and am working on a correction for it so it will eventually reappear on FC2 as a csv file. But my download is converted to a new format so it is not any use as an upload. But I do have warnings on row 200, 282, 292 and 505. I have not checked for others.

Started on the file HO107_1924.csv for somt.cen at 2022-03-31 07:38:10 +0100. Working on Stowey for 1851, in SOM. Warning: line 200 Notes contains information Deleted flag set on VLD; POB not locatedPOB not located. Warning: line 203 New Folio number increment larger than 1 14. Warning: line 282 Notes contains information Deleted flag set on VLD;. Warning: line 292 Notes contains information Deleted flag set on VLD; POB - Madehurst SSX or Midhurst SSX?POB - Madehurst SSX or Midhurst SSX?. Warning: line 505 Notes contains information Deleted flag set on VLD;.

I will re-validate this one and give you feedback on the Deleted flag comments.

Geoff

From: Anne Vandervord @. @.> > Sent: Wednesday, 30 March 2022 7:32 PM To: FreeUKGen/FreeCENMigration @. @.> > Cc: geoffj-FUG @. @.> >; Mention @. @.> > Subject: Re: [FreeUKGen/FreeCENMigration] Run rake task to check for VLD deletions (#1269)

Rake report has been written and tested on Test3 (but there were no VLD deletions there it would seem). Have asked @Vino-S https://github.com/Vino-S to deploy to Production, run and send output to me to review.

— Reply to this email directly, https://github.com/FreeUKGen/FreeCENMigration/issues/1269#issuecomment-1082842234 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AKCPIFJTK4XFKL3B52CDL73VCQNQHANCNFSM5BET5LHA unsubscribe. You are receiving this because you were mentioned. https://github.com/notifications/beacon/AKCPIFOO7K6F4XLUTELAE4DVCQNQHA5CNFSM5BET5LHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOICFNY6Q.gif Message ID: < @.> @.>

— Reply to this email directly, view it on GitHubhttps://github.com/FreeUKGen/FreeCENMigration/issues/1269#issuecomment-1274249737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARLANZ5YI5TF32VAW5VAKQTWCUMNFANCNFSM5BET5LHA. You are receiving this because you were mentioned.Message ID: @.***>

geoffj-FUG commented 1 year ago

Anne

Kirk set up a script that added an entry that a record was flagged as deleted into the notes column. I examined several hundred of these entries and compared them with the original image during validation. Not a single one of these entries was a deleted record. One piece had about 150 D flags which is an extraordinary number given that post validation there may only have been one or two in a piece.

In the old valdrev system when a record was deleted the D entry appeared there. There were other codes as well and they were used to flag the status of a record to valdrev. D in valdrev was definitely deleted. But not on the FC1 database.

I spent hours chasing this flag to no avail. It seems that the data was cleaned at some point but the D still popped up. We could not work out why it was there but were are sure that it is no longer related to a deleted record.

Geoff

From: Anne Vandervord @.> Sent: Tuesday, 11 October 2022 7:02 PM To: FreeUKGen/FreeCENMigration @.> Cc: geoffj-FUG @.>; Mention @.> Subject: Re: [FreeUKGen/FreeCENMigration] Run rake task to check for VLD deletions (#1269)

Hi Geoff,

As far as I can see the position (0 ie first character in the record) of the Deleted flag/marker in a VLD file is only used for that purpose. (I have checked through the code and can see no other use of the flag/marker at that position.)

FYI comments documented in the VLD file parser are as follows:

A 0 1 Deletion marker (D or blank)

B 1 4 Not used. Was Registration district - this usage is discontinued.

C 5 6 A six digit number (leading zeros) which counts the households

D 11 4 A four digit number (leading zeros) which counts members in eac household

E A 15 20 Parish name (I don't check this, it is up to you to get it right!)

F B 35 4 Enumeration district (3n+1a, the remaining numeric fields have trailing spaces)

G C 39 5 Folio number (4n+1a)

H D 44 4 Page number (4n)

I E 48 4 Schedule number (3n+1a)

J F 52 5 House number (4n+1a)

K G 57 30 House/Street name (default -)

L H 87 1 Uninhabited flag (b, u, v, n, x or -)

M I 88 24 Surname (capitals, default -)

N J 112 24 Forenames (default -)

O K 136 1 Flag for name fields (x or -)

P L 137 6 Relationship (default -)

Q M 143 1 Condition (M, S, U, W or -)

R N 144 1 Sex (M, F or -)

S O 145 3 Age (no default but 999=unknown/unreadable)

148 1 Age unit(y, m, w, d or -)

T P 149 1 Flag for detail fields i.e. rel/cond/sex/age (x or -)

U Q 150 30 Occupation

R Employed Status (extracted from occupation field)

V S 180 1 Flag for occupation (x or -)

W T 181 3 Transcriber County code (3a capitals, no default but UNK if not known)

X U 184 20 Transcriber Birth place (default -)

Y V 204 1 Flag for birth place (x or -)

Z W 205 6 Disability (default blank)

AA X 211 1 Language (W, E, B, G or blank)

AB Y 212 44 Notes (default blank, no case

Z 276 3 Alternate birth county

AA 279 20 Alternate birth place

KR Anne

On 11 Oct 2022, at 08:54, geoffj-FUG @.**@. mailto:***@***.******@***.*** >> wrote:

Anne

I have just had a look at all of these entries with deleted flags. None of them should be deleted. They are all valid entries.

I have a feeling that we are going to have out hypothesis about the D flag disproven. What other flags are in that column of the data please? That may tell us whether the data is related to the validation trail in valdrev.

Geoff

From: @. <mailto:@.> @. <mailto:@.> > Sent: Thursday, 31 March 2022 5:06 PM To: 'FreeUKGen/FreeCENMigration' @. <mailto:@.> >; 'FreeUKGen/FreeCENMigration' @. <mailto:@.> > Cc: 'Mention' @. <mailto:@.> > Subject: RE: [FreeUKGen/FreeCENMigration] Run rake task to check for VLD deletions (#1269)

Anne

A live one? Row 200 and a couple more in RG107_1924 SOM.

I cant upload it to test3 as I don’t have the original any more. It was on an old PC (Vista!). But you can probably access the vld file from the FC1 upload area.

I have downloaded the piece from FC2and am working on a correction for it so it will eventually reappear on FC2 as a csv file. But my download is converted to a new format so it is not any use as an upload. But I do have warnings on row 200, 282, 292 and 505. I have not checked for others.

Started on the file HO107_1924.csv for somt.cen at 2022-03-31 07:38:10 +0100. Working on Stowey for 1851, in SOM. Warning: line 200 Notes contains information Deleted flag set on VLD; POB not locatedPOB not located. Warning: line 203 New Folio number increment larger than 1 14. Warning: line 282 Notes contains information Deleted flag set on VLD;. Warning: line 292 Notes contains information Deleted flag set on VLD; POB - Madehurst SSX or Midhurst SSX?POB - Madehurst SSX or Midhurst SSX?. Warning: line 505 Notes contains information Deleted flag set on VLD;.

I will re-validate this one and give you feedback on the Deleted flag comments.

Geoff

From: Anne Vandervord @. <mailto:@.> @. <mailto:@.> > > Sent: Wednesday, 30 March 2022 7:32 PM To: FreeUKGen/FreeCENMigration @. <mailto:@.> @. <mailto:@.> > > Cc: geoffj-FUG @. <mailto:@.> @. <mailto:@.> > >; Mention @. <mailto:@.> @. <mailto:@.> > > Subject: Re: [FreeUKGen/FreeCENMigration] Run rake task to check for VLD deletions (#1269)

Rake report has been written and tested on Test3 (but there were no VLD deletions there it would seem). Have asked @Vino-S https://github.com/Vino-S to deploy to Production, run and send output to me to review.

— Reply to this email directly, https://github.com/FreeUKGen/FreeCENMigration/issues/1269#issuecomment-1082842234 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AKCPIFJTK4XFKL3B52CDL73VCQNQHANCNFSM5BET5LHA unsubscribe. You are receiving this because you were mentioned. https://github.com/notifications/beacon/AKCPIFOO7K6F4XLUTELAE4DVCQNQHA5CNFSM5BET5LHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOICFNY6Q.gif Message ID: < @. <mailto:@.> > @. <mailto:@.> >

— Reply to this email directly, view it on GitHubhttps://github.com/FreeUKGen/FreeCENMigration/issues/1269#issuecomment-1274249737, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARLANZ5YI5TF32VAW5VAKQTWCUMNFANCNFSM5BET5LHA. You are receiving this because you were mentioned.Message ID: @. <mailto:@.> >

— Reply to this email directly, view it on GitHub https://github.com/FreeUKGen/FreeCENMigration/issues/1269#issuecomment-1274352910 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AKCPIFIGA7V5BYHJXRRV6LTWCUUHHANCNFSM5BET5LHA . You are receiving this because you were mentioned. https://github.com/notifications/beacon/AKCPIFJSPPGU63FATBENSA3WCUUHHA5CNFSM5BET5LHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOJP2RKDQ.gif Message ID: @. @.> >