FreeUKGen / FreeCENMigration

Issue tracking for project migrating FreeCEN to FreeCEN2 genealogy record database and search engine architecture. Code developed here is based on that developed in MyopicVicar
https://www.freecen.org.uk
Apache License 2.0
4 stars 3 forks source link

Investigate the quality of the birth chapman code field. #803

Closed Captainkirkdawson closed 3 years ago

Captainkirkdawson commented 4 years ago

Work on story 165 has revealed serious questions about the quality of the Birth Chapman Code field

Captainkirkdawson commented 4 years ago

@richpomfret @FreecenBren This was initial reported under story 165 Place for test3 an older database. So I have repeated it on brazza for the production system. The results are frightening. Producing report of the unique field names Birth County 685 values ["\u0000\u0000\u0000", "ABD", "AGY", "ALD", "AMT", "ANH", "ANL", "ANS", "ANT", "ARL", "ARM", "ASk", "ASw", "AYR", "BAN", "BCa", "BCh", "BDF", "BDu", "BEW", "BEa", "BGi", "BGr", "BHo", "BIr", "BJa", "BKM", "BMa", "BN", "BNe", "BNo", "BPr", "BRE", "BRK", "BSi", "BSt", "BSy", "BUT", "BWe", "BYA", "BYB", "BYC", "BYD", "BYI", "BYS", "CAE", "CAI", "CAM", "CAR", "CAV", "CGN", "CHI", "CHS", "CLA", "CLK", "CMN", "CON", "COR", "CTS", "CUL", "D-", "DBY", "DDo", "DEN", "DEV", "DFS", "DLo", "DNB", "DON", "DOR", "DOW", "DPo", "DSt", "DUB", "DUR", "DXL", "ECa", "ECr", "EIB", "EIG", "EIH", "EIK", "EIM", "EIP", "EIS", "EIW", "ELN", "ENG", "ERY", "ESS", "FBe", "FBi", "FBl", "FER", "FFr", "FHa", "FHo", "FIF", "FIt", "FLN", "FLe", "FLu", "FPh", "FPr", "FSa", "FWo", "GAL", "GLA", "GLS", "GLi", "GSY", "HAM", "HBi", "HEF", "HEl", "HFo", "HHe", "HKe", "HKi", "HNo", "HPi", "HRT", "HSK", "HST", "HUN", "HWe", "IAb", "IAs", "IBa", "IBo", "ICo", "ICr", "IDa", "IEa", "IEv", "IFi", "IFr", "IGa", "IGl", "IGr", "IHa", "IHo", "IIr", "IKi", "ILe", "ILo", "IMu", "INA", "INB", "ING", "INL", "INM", "INN", "INO", "INS", "INV", "INe", "IOM", "IOW", "IOl", "IOw", "IPl", "IQu", "IRL", "IRe", "ISa", "ISe", "ISi", "ISt", "IWa", "IWh", "IWi", "IWy", "JSY", "K-", "KCD", "KCo", "KEN", "KER", "KEa", "KHe", "KHi", "KID", "KIK", "KKD", "KMW", "KMe", "KMi", "KNe", "KNo", "KPa", "KPu", "KRS", "KRo", "KSD", "KSL", "KSM", "KSS", "KSW", "KSY", "KSh", "KSo", "KSt", "KWo", "L-", "LAN", "LAl", "LAt", "LBa", "LBe", "LBi", "LBo", "LBr", "LCa", "LCl", "LCo", "LCr", "LDN", "LDY", "LDe", "LDi", "LDu", "LEI", "LEN", "LET", "LEX", "LEt", "LGr", "LHe", "LIM", "LIN", "LIr", "LKS", "LKe", "LKi", "LLa", "LMa", "LND", "LNe", "LNo", "LOG", "LOU", "LPe", "LPo", "LQu", "LRu", "LSh", "LSt", "LTi", "LTy", "LWa", "LWe", "LWh", "LWi", "LWo", "LWr", "M-", "MAY", "MAb", "MBa", "MBe", "MBr", "MCa", "MCh", "MDX", "MDu", "MEA", "MER", "MFr", "MGY", "MLN", "MMa", "MOG", "MON", "MOR", "MPo", "MWi", "N-", "NAI", "NAb", "NAc", "NAd", "NAi", "NAl", "NAn", "NAs", "NAt", "NAz", "NBL", "NBa", "NBe", "NBi", "NBl", "NBo", "NBr", "NBu", "NCa", "NCh", "NCl", "NCo", "NCr", "NCu", "NDL", "NDa", "NDe", "NDi", "NDo", "NDu", "NEc", "NEd", "NEu", "NEv", "NFK", "NFa", "NFe", "NFl", "NFo", "NGa", "NGl", "NGo", "NGr", "NHa", "NHe", "NHi", "NHo", "NHu", "NIn", "NKe", "NKi", "NKn", "NLa", "NLe", "NLi", "NLl", "NLo", "NLy", "NMa", "NMi", "NMo", "NNe", "NNo", "NOl", "NOr", "NOv", "NPa", "NPe", "NPi", "NPl", "NPo", "NPr", "NQu", "NRY", "NRa", "NRi", "NRo", "NRu", "NSa", "NSe", "NSh", "NSi", "NSk", "NSo", "NSp", "NSt", "NSu", "NTH", "NTT", "NTh", "NTo", "NTu", "NTw", "NTy", "NUl", "NWa", "NWe", "NWh", "NWi", "NWo", "NWr", "OFF", "OKI", "OUC", "OVB", "OVF", "OXF", "PEE", "PEM", "PER", "R-", "RAD", "RAs", "RBi", "RBl", "RBr", "RCo", "RCr", "RCy", "RDa", "RDu", "RFW", "RHo", "RLI", "RNu", "ROC", "ROS", "ROX", "RRe", "RRu", "RSt", "RSu", "RUT", "RWo", "RYB", "S-", "SAL", "SAc", "SAd", "SAl", "SAm", "SAr", "SAs", "SBa", "SBe", "SBi", "SBo", "SBr", "SCT", "SCa", "SCh", "SCl", "SCo", "SCu", "SDa", "SDe", "SDo", "SDu", "SEL", "SEa", "SEd", "SEg", "SEl", "SEt", "SFK", "SFl", "SFr", "SGe", "SGl", "SGo", "SGr", "SHI", "SHa", "SHe", "SHi", "SHo", "SHu", "SHy", "SId", "SJo", "SKe", "SKi", "SKn", "SKu", "SLI", "SLe", "SLi", "SLo", "SLu", "SMa", "SMe", "SMi", "SMo", "SNa", "SNe", "SNo", "SOM", "SPa", "SPe", "SPu", "SRK", "SRY", "SRa", "SRi", "SRo", "SRu", "SSX", "SSa", "SSc", "SSe", "SSh", "SSm", "SSt", "SSw", "STI", "STS", "STa", "STh", "STr", "SUT", "SVi", "SWa", "SWe", "SWh", "SWi", "SWo", "SWr", "SWy", "SYo", "T-", "TAr", "TAs", "TAt", "TAv", "TBa", "TBe", "TBi", "TBl", "TBo", "TBr", "TBu", "TCa", "TCl", "TCo", "TCr", "TDa", "TDu", "TEa", "TEd", "TEl", "TEp", "TEv", "TFa", "TFi", "TFl", "TFo", "TGe", "TGi", "TGl", "TGo", "TGr", "TGu", "THa", "THe", "THi", "THo", "THu", "TIP", "TKe", "TKi", "TKn", "TLa", "TLe", "TLi", "TLo", "TMa", "TMo", "TMu", "TNe", "TNo", "TOl", "TOr", "TOs", "TOw", "TOx", "TPa", "TPl", "TRa", "TRe", "TRo", "TRu", "TSa", "TSc", "TSe", "TSh", "TSi", "TSk", "TSn", "TSo", "TSt", "TSu", "TSy", "TT-", "TTA", "TTB", "TTC", "TTE", "TTF", "TTG", "TTH", "TTK", "TTL", "TTM", "TTN", "TTO", "TTR", "TTS", "TTT", "TTW", "TTh", "TTi", "TTo", "TTu", "TTy", "TUp", "TWa", "TWe", "TWh", "TWi", "TWo", "TYR", "U-", "UNK", "UNS", "UTA", "UTC", "V-", "VBr", "VCo", "VDe", "VPl", "VTa", "VTo", "W-", "WAL", "WAR", "WAT", "WEM", "WES", "WEX", "WIC", "WIG", "WIL", "WLN", "WNe", "WOR", "WPa", "WRY", "X-", "XBo", "XBr", "XCh", "XCu", "XGr", "XHa", "XIs", "XKi", "XKn", "XLe", "XLo", "XMa", "XMi", "XPi", "XPo", "XWe", "Y-", "YBa", "YBe", "YBi", "YBo", "YCa", "YCh", "YCl", "YCo", "YCr", "YDa", "YDe", "YDu", "YGr", "YHe", "YHo", "YIl", "YKS", "YLl", "YLo", "YLu", "YMa", "YNe", "YOc", "YPe", "YRi", "YSt", "YTi", "YWe", "YWh", "YWi", "YWo"] Verbatim Birth County 684 values ["\u0000\u0000\u0000", "ABD", "AGY", "ALD", "AMT", "ANH", "ANL", "ANS", "ANT", "ARL", "ARM", "ASk", "ASw", "AYR", "BAN", "BCa", "BCh", "BDF", "BDu", "BEW", "BEa", "BGi", "BGr", "BHo", "BIr", "BJa", "BKM", "BMa", "BN", "BNe", "BNo", "BPr", "BRE", "BRK", "BSi", "BSt", "BSy", "BUT", "BWe", "BYA", "BYB", "BYC", "BYD", "BYI", "BYS", "CAE", "CAI", "CAM", "CAR", "CAV", "CGN", "CHI", "CHS", "CLA", "CLK", "CMN", "CON", "COR", "CTS", "CUL", "D-", "DBY", "DDo", "DEN", "DEV", "DFS", "DLo", "DNB", "DON", "DOR", "DOW", "DPo", "DSt", "DUB", "DUR", "DXL", "ECa", "EIB", "EIG", "EIH", "EIK", "EIM", "EIP", "EIS", "EIW", "ELN", "ENG", "ERY", "ESS", "FBe", "FBi", "FBl", "FER", "FFr", "FHa", "FHo", "FIF", "FIt", "FLN", "FLe", "FLu", "FPh", "FPr", "FSa", "FWo", "GAL", "GLA", "GLS", "GLi", "GSY", "HAM", "HBi", "HEF", "HEl", "HFo", "HHe", "HKe", "HKi", "HNo", "HPi", "HRT", "HSK", "HST", "HUN", "HWe", "IAb", "IAs", "IBa", "IBo", "ICo", "ICr", "IDa", "IEa", "IEv", "IFi", "IFr", "IGa", "IGl", "IGr", "IHa", "IHo", "IIr", "IKi", "ILe", "ILo", "IMu", "INA", "INB", "ING", "INL", "INM", "INN", "INO", "INS", "INV", "INe", "IOM", "IOW", "IOl", "IOw", "IPl", "IQu", "IRL", "IRe", "ISa", "ISe", "ISi", "ISt", "IWa", "IWh", "IWi", "IWy", "JSY", "K-", "KCD", "KCo", "KEN", "KER", "KEa", "KHe", "KHi", "KID", "KIK", "KKD", "KMW", "KMe", "KMi", "KNe", "KNo", "KPa", "KPu", "KRS", "KRo", "KSD", "KSL", "KSM", "KSS", "KSW", "KSY", "KSh", "KSo", "KSt", "KWo", "L-", "LAN", "LAl", "LAt", "LBa", "LBe", "LBi", "LBo", "LBr", "LCa", "LCl", "LCo", "LCr", "LDN", "LDY", "LDe", "LDi", "LDu", "LEI", "LEN", "LET", "LEX", "LEt", "LGr", "LHe", "LIM", "LIN", "LIr", "LKS", "LKe", "LKi", "LLa", "LMa", "LND", "LNe", "LNo", "LOG", "LOU", "LPe", "LPo", "LQu", "LRu", "LSh", "LSt", "LTi", "LTy", "LWa", "LWe", "LWh", "LWi", "LWo", "LWr", "M-", "MAY", "MAb", "MBa", "MBe", "MBr", "MCa", "MCh", "MDX", "MDu", "MEA", "MER", "MFr", "MGY", "MLN", "MMa", "MOG", "MON", "MOR", "MPo", "MWi", "N-", "NAI", "NAb", "NAc", "NAd", "NAi", "NAl", "NAn", "NAs", "NAt", "NAz", "NBL", "NBa", "NBe", "NBi", "NBl", "NBo", "NBr", "NBu", "NCa", "NCh", "NCl", "NCo", "NCr", "NCu", "NDL", "NDa", "NDe", "NDi", "NDo", "NDu", "NEc", "NEd", "NEu", "NEv", "NFK", "NFa", "NFe", "NFl", "NFo", "NGa", "NGl", "NGo", "NGr", "NHa", "NHe", "NHi", "NHo", "NHu", "NIn", "NKe", "NKi", "NKn", "NLa", "NLe", "NLi", "NLl", "NLo", "NLy", "NMa", "NMi", "NMo", "NNe", "NNo", "NOl", "NOr", "NOv", "NPa", "NPe", "NPi", "NPl", "NPo", "NPr", "NQu", "NRY", "NRa", "NRi", "NRo", "NRu", "NSa", "NSe", "NSh", "NSi", "NSk", "NSo", "NSp", "NSt", "NSu", "NTH", "NTT", "NTh", "NTo", "NTu", "NTw", "NTy", "NUl", "NWa", "NWe", "NWh", "NWi", "NWo", "NWr", "OFF", "OKI", "OUC", "OVB", "OVF", "OXF", "PEE", "PEM", "PER", "R-", "RAD", "RAs", "RBi", "RBl", "RBr", "RCo", "RCr", "RCy", "RDa", "RDu", "RFW", "RHo", "RLI", "RNu", "ROC", "ROS", "ROX", "RRe", "RRu", "RSt", "RSu", "RUT", "RWo", "RYB", "S-", "SAL", "SAc", "SAd", "SAl", "SAr", "SAs", "SBa", "SBe", "SBi", "SBo", "SBr", "SBu", "SCT", "SCa", "SCh", "SCl", "SCo", "SCu", "SDa", "SDe", "SDo", "SDu", "SEL", "SEa", "SEd", "SEg", "SEl", "SEt", "SFK", "SFl", "SFr", "SGe", "SGl", "SGo", "SGr", "SHI", "SHa", "SHe", "SHi", "SHo", "SHu", "SHy", "SId", "SJo", "SKe", "SKi", "SKn", "SKu", "SLI", "SLe", "SLi", "SLo", "SLu", "SMa", "SMe", "SMi", "SMo", "SNa", "SNe", "SNo", "SOM", "SPa", "SPe", "SPu", "SRK", "SRY", "SRa", "SRi", "SRo", "SRu", "SSX", "SSa", "SSc", "SSe", "SSh", "SSm", "SSt", "SSw", "STI", "STS", "STa", "STh", "STr", "SUT", "SVi", "SWa", "SWe", "SWh", "SWi", "SWo", "SWr", "SWy", "SYo", "T-", "TAr", "TAs", "TAt", "TAv", "TBa", "TBe", "TBi", "TBl", "TBo", "TBr", "TBu", "TCa", "TCl", "TCo", "TCr", "TDa", "TDu", "TEa", "TEd", "TEl", "TEp", "TEv", "TFa", "TFi", "TFl", "TFo", "TGe", "TGi", "TGl", "TGo", "TGr", "TGu", "THa", "THe", "THi", "THo", "THu", "TIP", "TKe", "TKi", "TKn", "TLa", "TLe", "TLi", "TLo", "TMa", "TMo", "TMu", "TNe", "TNo", "TOl", "TOr", "TOs", "TOw", "TOx", "TPa", "TPl", "TRa", "TRe", "TRo", "TRu", "TSa", "TSc", "TSe", "TSh", "TSi", "TSk", "TSn", "TSo", "TSt", "TSu", "TSy", "TT-", "TTA", "TTB", "TTC", "TTE", "TTF", "TTG", "TTH", "TTK", "TTL", "TTM", "TTN", "TTO", "TTR", "TTS", "TTT", "TTW", "TTh", "TTi", "TTo", "TTu", "TTy", "TUp", "TWa", "TWe", "TWh", "TWi", "TWo", "TYR", "U-", "UNK", "UNS", "UTA", "UTC", "V-", "VBr", "VCo", "VDe", "VPl", "VTa", "VTo", "W-", "WAL", "WAR", "WAT", "WEM", "WES", "WEX", "WIC", "WIG", "WIL", "WLN", "WNe", "WOR", "WPa", "WRY", "X-", "XBo", "XBr", "XCh", "XCu", "XGr", "XHa", "XIs", "XKi", "XKn", "XLe", "XLo", "XMa", "XMi", "XPi", "XPo", "XWe", "Y-", "YBa", "YBe", "YBi", "YBo", "YCa", "YCh", "YCl", "YCo", "YCr", "YDa", "YDe", "YDu", "YGr", "YHe", "YHo", "YIl", "YKS", "YLl", "YLo", "YLu", "YMa", "YNe", "YOc", "YPe", "YRi", "YSt", "YTi", "YWe", "YWh", "YWi", "YWo"] Birth County NOT in Verbatim Birth County 2 values ["ECr", "SAm"] Verbatim Birth County NOT in Birth County 1 values ["SBu"]

Captainkirkdawson commented 4 years ago

Clearly we need a systematic correction? There are only 155 Valid Chapman Codes for CEN. It would appear that the process noted by @FreecenBren last June in the places story lets through errors

_The Chapman Code is added at the Transcription CSV stage by the transcriber from the Census images. It therefore goes through all the stages and into the final VLD.

If any CHP is not from the accepted list, (example. Typed incorrectly) and is not picked up by the any of the FreeCEN stages of,

  1. CSV
  2. CSVCHECK,
  3. DAT
  4. CHE
  5. FCTOOLS,
  6. finally to a VLD, then the upload will throw it our as an error. We then have to do an amended VLD. In FC1 there must be a check for it to throw out the error._
FreecenBren commented 4 years ago

A good example of this is in the current FC1 Errors page for the last FC1 update

https://freecen1.freecen.org.uk/errors.html The amendment has been done and will be added to Lemons amended folder for the next update. The 1851 census Piece number is GLS Piece Number HO511972

It can also be seen at the FC2 errors page at (Last 2 on list) https://www.freecen.org.uk/freecen_errors


I now wish though I had not looked. Co-incidentally there is another failure on FC2 1891 census for GLS RG121972 also ending in the same 4 digits and is also failing

The FC1 Piece for this one though is Online and has been since 2011 with no issue. I have just had a quick look at the VLD and it is OK.

It now says the number 1972 in not unique. ( That must be because the 1851 VLD Census which is earlier than the 1891 VLD and it has taken that number and must be because they are the same County of GLS. I will have to leave that with you to raise a query to fix the issue.

https://www.freecen.org.uk/freecen_errors

FC2 is trying to link GLS 1851 HO1071972 with GLS 1891 RG121972.

So that explains the Error message on FC2

1851 1972 Error (not unique) I have therefore checked the National Archives and this is what I have found.

Each Census year will have a number 1972 except 1841 as they only haVe 3 digits. 1851 is GLS 1861 is STS 1871 is DOR 1881 is NFK 1891 is GLS

Sorry about that but I would have found it when I did the Errors for the next upload. Just a coincidence that they are the same county.

There also is You will also see on the FC2 errors page one for BRK that FC1 passed but FC2 failed. I still have this one to look at but could be the # . I hope so anyway!

PatReynolds commented 4 years ago

@Captainkirkdawson when you say "In FC1 there must be a check for it to throw out the error._" aboce, I think you mean on FC2. I would place the check at the level of Transcriber, as it will never be a matter of interpretation / difficulty reading the original.

Captainkirkdawson commented 4 years ago

I have looked at where one of the problem counties comes from. birth_chapman_code": "YPe It is generated by 1 search record. That has the following content problem.JPG Clearly something is badly wrong with this file. How it got into the system I do not know. It was created on ISODate("2018-07-04T11:37:15.815Z")

What is clear is that this one piece is responsible for a large number of the errors (Perhaps all)

It belongs to file "rg103546.vld" That file should be reloaded.

FreecenBren commented 4 years ago

I will look at it today and do an amendment. It will then be replaced at next update.Will let you know when done

Captainkirkdawson commented 4 years ago

Thank you @FreecenBren I will take no further action on this story until we have a replacement file

richpomfret commented 4 years ago

@FreecenBren to let you know that @Vino-S will be doing the next FC2 update tomorrow (Thurs).

FreecenBren commented 4 years ago

How?

I have not shared the Dropbox folder with anyone else yet Also I am still adding some in.. I send Lemon an email when they are ready and I have not done that yet. Confused. Brenda

Captainkirkdawson commented 4 years ago

With the completion of the latest update to FC. the number of incorrect chapman codes has been reduced from 685 to 442. A significant decrease but there remain 250+ errors. Verbatim Birth County 442 values ["\u0000\u0000\u0000", "ABD", "AGY", "ALD", "ANS", "ANT", "ARL", "ARM", "ASk", "ASw", "AYR", "BAN", "BCa", "BCh", "BDF", "BEW", "BEa", "BGi", "BHo", "BJa", "BKM", "BMa", "BN", "BNe", "BNo", "BPr", "BRE", "BRK", "BSi", "BSt", "BSy", "BUT", "BWe", "CAE", "CAI", "CAM", "CAR", "CAV", "CGN", "CHI", "CHS", "CLA", "CLK", "CMN", "CON", "COR", "CUL", "D-", "DBY", "DEN", "DEV", "DFS", "DNB", "DON", "DOR", "DOW", "DPo", "DSt", "DUB", "DUR", "ELN", "ENG", "ERY", "ESS", "FBe", "FER", "FFr", "FHa", "FHo", "FIF", "FIt", "FLN", "FPr", "GAL", "GLA", "GLS", "GLi", "GSY", "HAM", "HEF", "HRT", "HUN", "HWe", "IAs", "IGl", "ILe", "INV", "IOM", "IOW", "IRL", "JSY", "K-", "KCD", "KEN", "KER", "KHi", "KID", "KIK", "KKD", "KMe", "KMi", "KPa", "KRS", "KSo", "KWo", "L-", "LAN", "LAl", "LAt", "LBa", "LBe", "LBi", "LBo", "LBr", "LCa", "LCl", "LCo", "LCr", "LDN", "LDY", "LDe", "LDi", "LDu", "LEI", "LEN", "LET", "LEX", "LEt", "LGr", "LHe", "LIM", "LIN", "LKS", "LKe", "LKi", "LLa", "LMa", "LND", "LNe", "LNo", "LOG", "LOU", "LPe", "LPo", "LQu", "LRu", "LSh", "LSt", "LTi", "LTy", "LWa", "LWe", "LWh", "LWi", "LWo", "M-", "MAY", "MBa", "MBe", "MBr", "MCa", "MDX", "MEA", "MER", "MGY", "MLN", "MOG", "MON", "MOR", "MPo", "MWi", "N-", "NAI", "NAb", "NAc", "NAd", "NAi", "NAl", "NAs", "NAt", "NBL", "NBa", "NBe", "NBi", "NBl", "NBo", "NBr", "NBu", "NCa", "NCh", "NCl", "NCo", "NCr", "NCu", "NDa", "NDe", "NDi", "NDu", "NEc", "NEd", "NEu", "NEv", "NFK", "NFa", "NFe", "NFl", "NGa", "NGl", "NGo", "NGr", "NHa", "NHe", "NHi", "NHo", "NHu", "NIn", "NKe", "NKi", "NKn", "NLa", "NLe", "NLi", "NLl", "NLo", "NLy", "NMa", "NMi", "NMo", "NNe", "NNo", "NOl", "NOr", "NOv", "NPa", "NPe", "NPi", "NPl", "NPo", "NPr", "NRY", "NRa", "NRi", "NRo", "NRu", "NSa", "NSe", "NSh", "NSi", "NSk", "NSo", "NSt", "NSu", "NTH", "NTT", "NTh", "NTo", "NTu", "NTw", "NTy", "NUl", "NWa", "NWe", "NWh", "NWi", "NWo", "NWr", "OFF", "OKI", "OUC", "OVB", "OVF", "OXF", "PEE", "PEM", "PER", "R-", "RAD", "RBi", "RBr", "RCr", "RCy", "RDa", "RDu", "RFW", "RHo", "RNu", "ROC", "ROS", "ROX", "RRu", "RSt", "RSu", "RUT", "RWo", "S-", "SAL", "SAl", "SAr", "SAs", "SBa", "SBe", "SBi", "SBo", "SBr", "SBu", "SCT", "SCa", "SCh", "SCl", "SCo", "SCu", "SDa", "SDe", "SDo", "SDu", "SEL", "SEa", "SEd", "SEg", "SEl", "SEt", "SFK", "SFl", "SFr", "SGe", "SGl", "SGo", "SGr", "SHI", "SHa", "SHe", "SHi", "SHo", "SHu", "SHy", "SJo", "SKe", "SKi", "SKn", "SKu", "SLI", "SLe", "SLi", "SLu", "SMa", "SMe", "SMi", "SMo", "SNa", "SNe", "SNo", "SOM", "SPa", "SPe", "SPu", "SRK", "SRY", "SRa", "SRi", "SRo", "SRu", "SSX", "SSa", "SSc", "SSe", "SSh", "SSm", "SSt", "STI", "STS", "STa", "STr", "SUT", "SVi", "SWa", "SWi", "SWo", "SWr", "SWy", "SYo", "T-", "TDu", "TGl", "TGr", "THo", "TIP", "TKi", "TMa", "TPa", "TRe", "TSu", "TUp", "TWi", "TYR", "U-", "UNK", "V-", "VBr", "VCo", "VPl", "VTo", "W-", "WAL", "WAR", "WAT", "WEM", "WES", "WEX", "WIC", "WIG", "WIL", "WLN", "WOR", "WPa", "WRY", "X-", "XBr", "XCh", "XCu", "XGr", "XHa", "XLe", "XLo", "XMa", "XMi", "XPo", "Y-", "YBa", "YBo", "YCh", "YCl", "YCo", "YDe", "YDu", "YKD", "YKS", "YLl", "YLu", "YMa", "YNe", "YSt", "YWe"] We need locate and correct the files that have (are) creating these incorrect values.

Captainkirkdawson commented 4 years ago

Investigated further Number of individuals with invalid Chapman birth codes {"\u0000\u0000\u0000"=>14173, "ASk"=>1, "ASw"=>1, "BCa"=>1, "BCh"=>1, "BEa"=>1, "BGi"=>1, "BHo"=>1, "BJa"=>2, "BMa"=>1, "BN"=>1, "BNe"=>4, "BNo"=>1, "BPr"=>1, "BSi"=>1, "BSt"=>3, "BSy"=>2, "BWe"=>3, "D-"=>34, "DPo"=>1, "DSt"=>1, "ECr"=>1, "FBe"=>2, "FFr"=>1, "FHa"=>1, "FHo"=>1, "FIt"=>1, "FPr"=>1, "GLi"=>1, "HWe"=>2, "IAs"=>1, "IGl"=>1, "ILe"=>1, "K-"=>18, "KHi"=>2, "KMe"=>1, "KMi"=>1, "KPa"=>1, "KSo"=>2, "KWo"=>1, "L-"=>1651, "LAl"=>1, "LAt"=>1, "LBa"=>1, "LBe"=>2, "LBi"=>1, "LBo"=>1, "LBr"=>3, "LCa"=>17, "LCl"=>3, "LCo"=>6, "LCr"=>1, "LDN"=>2, "LDe"=>1, "LDi"=>1, "LDu"=>5, "LEN"=>1, "LEt"=>1, "LGr"=>1, "LHe"=>1, "LKe"=>3, "LKi"=>3, "LLa"=>1, "LMa"=>4, "LNe"=>10, "LNo"=>5, "LPe"=>4, "LPo"=>1, "LQu"=>2, "LRu"=>1, "LSh"=>3, "LSt"=>9, "LTi"=>1, "LTy"=>2, "LWa"=>5, "LWe"=>5, "LWh"=>7, "LWi"=>1, "LWo"=>3, "M-"=>5, "MBa"=>1, "MBe"=>1, "MBr"=>1, "MCa"=>3, "MPo"=>2, "MWi"=>1, "N-"=>14, "NAb"=>1, "NAc"=>2, "NAd"=>1, "NAi"=>7, "NAl"=>1, "NAs"=>9, "NAt"=>10, "NBa"=>1, "NBe"=>9, "NBi"=>4, "NBl"=>31, "NBo"=>1193, "NBr"=>16, "NBu"=>19, "NCa"=>5, "NCh"=>48, "NCl"=>4, "NCo"=>5, "NCr"=>5, "NCu"=>5, "NDa"=>13, "NDe"=>6, "NDi"=>1, "NDu"=>2, "NEc"=>2, "NEd"=>3, "NEu"=>1, "NEv"=>32, "NFa"=>30, "NFe"=>1, "NFl"=>1, "NGa"=>8, "NGl"=>1, "NGo"=>1, "NGr"=>476, "NHa"=>54, "NHe"=>15, "NHi"=>11, "NHo"=>22, "NHu"=>1, "NIn"=>1, "NKe"=>3, "NKi"=>127, "NKn"=>1, "NLa"=>14, "NLe"=>12, "NLi"=>1524, "NLl"=>2, "NLo"=>6, "NLy"=>2, "NMa"=>95, "NMi"=>2, "NMo"=>4, "NNe"=>2, "NNo"=>2, "NOl"=>3, "NOr"=>4, "NOv"=>7, "NPa"=>7, "NPe"=>15, "NPi"=>1, "NPl"=>1, "NPo"=>2, "NPr"=>43, "NRa"=>3, "NRi"=>6, "NRo"=>16, "NRu"=>24, "NSa"=>11, "NSe"=>9, "NSh"=>9, "NSi"=>2, "NSk"=>4, "NSo"=>3, "NSt"=>8, "NSu"=>1, "NTh"=>1, "NTo"=>22, "NTu"=>6, "NTw"=>1, "NTy"=>15, "NUl"=>3, "NWa"=>94, "NWe"=>19, "NWh"=>9, "NWi"=>54, "NWo"=>16, "NWr"=>3, "R-"=>4, "RBi"=>5, "RBr"=>2, "RCr"=>1, "RCy"=>1, "RDa"=>1, "RDu"=>4, "RHo"=>1, "RNu"=>1, "RRu"=>1, "RSt"=>2, "RSu"=>1, "RWo"=>1, "S-"=>38, "SAl"=>3, "SAm"=>1, "SAr"=>3, "SAs"=>1, "SBa"=>3, "SBe"=>1, "SBi"=>24, "SBo"=>2, "SBr"=>15, "SCa"=>1, "SCh"=>31, "SCl"=>1, "SCo"=>3, "SCu"=>1, "SDa"=>3, "SDe"=>5, "SDo"=>3, "SDu"=>2, "SEa"=>2, "SEd"=>1, "SEg"=>1, "SEl"=>2, "SEt"=>1, "SFl"=>5, "SFr"=>2, "SGe"=>1, "SGl"=>1, "SGo"=>1, "SGr"=>1, "SHa"=>8, "SHe"=>1, "SHi"=>1, "SHo"=>6, "SHu"=>12, "SHy"=>4, "SJo"=>1, "SKe"=>11, "SKi"=>10, "SKn"=>3, "SKu"=>1, "SLe"=>7, "SLi"=>3, "SLu"=>1, "SMa"=>5, "SMe"=>1, "SMi"=>6, "SMo"=>4, "SNa"=>2, "SNe"=>4, "SNo"=>1, "SPa"=>1, "SPe"=>2, "SPu"=>1, "SRa"=>1, "SRi"=>1, "SRo"=>3, "SRu"=>5, "SSa"=>4, "SSc"=>1, "SSe"=>6, "SSh"=>5, "SSm"=>1, "SSt"=>17, "STa"=>1, "STr"=>3, "SVi"=>1, "SWa"=>6, "SWi"=>3, "SWo"=>4, "SWr"=>1, "SWy"=>1, "SYo"=>4, "T-"=>60, "TDu"=>3, "TGl"=>2, "TGr"=>4, "THo"=>1, "TKi"=>1, "TMa"=>1, "TPa"=>2, "TRe"=>1, "TSu"=>1, "TUp"=>1, "TWi"=>1, "U-"=>1, "V-"=>1, "VBr"=>1, "VCo"=>1, "VPl"=>3, "VTo"=>1, "W-"=>1, "WPa"=>1, "X-"=>1, "XBr"=>1, "XCh"=>1, "XCu"=>1, "XGr"=>1, "XHa"=>1, "XLe"=>1, "XLo"=>16, "XMa"=>1, "XMi"=>3, "XPo"=>1, "Y-"=>14, "YBa"=>1, "YBo"=>1, "YCh"=>1, "YCl"=>1, "YCo"=>1, "YDe"=>2, "YDu"=>1, "YKD"=>1, "YLl"=>1, "YLu"=>1, "YMa"=>2, "YNe"=>1, "YSt"=>1, "YWi"=>1}

Captainkirkdawson commented 4 years ago

These files have invalid chapman codes in the birth county field rg122720.vld HS410879.VLD RG093137.VLD RG103297.VLD RG122590.VLD RG122641.VLD rg092721.vld rg092827.vld rg092666.vld rg103710.vld HS410707.vld RG091946.VLD

Captainkirkdawson commented 4 years ago

Many of the preceding files have multiple errors. The cross relationship between the incorrect chapman code and the specific file follows {"\u0000\u0000\u0000"=>["rg122720.vld", "HS410879.VLD", "RG093137.VLD", "RG103297.VLD", "RG122590.VLD", "RG122641.VLD"], "ASk"=>["rg092721.vld"], "ASw"=>["rg092721.vld"], "BCa"=>["rg092721.vld"], "BCh"=>["rg092721.vld"], "BEa"=>["rg092721.vld"], "BGi"=>["rg092721.vld"], "BHo"=>["rg092827.vld"], "BJa"=>["rg092666.vld"], "BMa"=>["rg092666.vld"], "BN"=>["rg092721.vld"], "BNe"=>["rg092721.vld"], "BNo"=>["rg092666.vld"], "BPr"=>["rg092721.vld"], "BSi"=>["rg092721.vld"], "BSt"=>["rg092666.vld", "rg092721.vld"], "BSy"=>["rg092721.vld"], "BWe"=>["rg092721.vld", "rg092827.vld"], "D-"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "DPo"=>["rg092721.vld"], "DSt"=>["rg092721.vld"], "ECr"=>["rg092721.vld"], "FBe"=>["rg092827.vld", "rg092721.vld"], "FFr"=>["rg092721.vld"], "FHa"=>["rg092827.vld"], "FHo"=>["rg092721.vld"], "FIt"=>["rg092721.vld"], "FPr"=>["rg092721.vld"], "GLi"=>["rg092721.vld"], "HWe"=>["rg092666.vld", "rg092721.vld"], "IAs"=>["rg092721.vld"], "IGl"=>["rg092721.vld"], "ILe"=>["rg092827.vld"], "K-"=>["rg092666.vld", "rg092721.vld"], "KHi"=>["rg092721.vld", "rg092827.vld"], "KMe"=>["rg092721.vld"], "KMi"=>["rg092721.vld"], "KPa"=>["rg092721.vld"], "KSo"=>["rg092721.vld"], "KWo"=>["rg092721.vld"], "L-"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "LAl"=>["rg092721.vld"], "LAt"=>["rg092721.vld"], "LBa"=>["rg092721.vld"], "LBe"=>["rg092827.vld"], "LBi"=>["rg092721.vld"], "LBo"=>["rg092827.vld"], "LBr"=>["rg092721.vld"], "LCa"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "LCl"=>["rg092721.vld"], "LCo"=>["rg092666.vld", "rg092827.vld", "rg092721.vld"], "LCr"=>["rg092721.vld"], "LDN"=>["rg103710.vld"], "LDe"=>["rg092721.vld"], "LDi"=>["rg092721.vld"], "LDu"=>["rg092827.vld", "rg092721.vld"], "LEN"=>["HS410707.vld"], "LEt"=>["rg092721.vld"], "LGr"=>["rg092721.vld"], "LHe"=>["rg092721.vld"], "LKe"=>["rg092827.vld", "rg092721.vld"], "LKi"=>["rg092827.vld", "rg092721.vld"], "LLa"=>["rg092721.vld"], "LMa"=>["rg092666.vld", "rg092721.vld", "rg092827.vld"], "LNe"=>["rg092721.vld", "rg092666.vld"], "LNo"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "LPe"=>["rg092827.vld", "rg092721.vld"], "LPo"=>["rg092721.vld"], "LQu"=>["rg092721.vld"], "LRu"=>["rg092666.vld"], "LSh"=>["rg092666.vld", "rg092721.vld"], "LSt"=>["rg092721.vld", "rg092827.vld"], "LTi"=>["rg092721.vld"], "LTy"=>["rg092721.vld"], "LWa"=>["rg092721.vld"], "LWe"=>["rg092827.vld", "rg092721.vld"], "LWh"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "LWi"=>["rg092721.vld"], "LWo"=>["rg092666.vld", "rg092721.vld"], "M-"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "MBa"=>["rg092827.vld"], "MBe"=>["rg092666.vld"], "MBr"=>["rg092721.vld"], "MCa"=>["rg092666.vld", "rg092721.vld"], "MPo"=>["rg092827.vld", "rg092721.vld"], "MWi"=>["rg092721.vld"], "N-"=>["rg092827.vld", "rg092721.vld"], "NAb"=>["rg092721.vld"], "NAc"=>["rg092827.vld"], "NAd"=>["rg092827.vld"], "NAi"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "NAl"=>["rg092721.vld"], "NAs"=>["rg092827.vld"], "NAt"=>["rg092827.vld"], "NBa"=>["rg092827.vld"], "NBe"=>["rg092827.vld"], "NBi"=>["rg092721.vld", "rg092827.vld"], "NBl"=>["rg092827.vld", "rg092721.vld"], "NBo"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "NBr"=>["rg092827.vld", "rg092721.vld"], "NBu"=>["rg092827.vld", "rg092721.vld"], "NCa"=>["rg092721.vld", "rg092827.vld"], "NCh"=>["rg092827.vld", "rg092721.vld"], "NCl"=>["rg092827.vld", "rg092721.vld"], "NCo"=>["rg092827.vld"], "NCr"=>["rg092827.vld"], "NCu"=>["rg092827.vld"], "NDa"=>["rg092721.vld", "rg092827.vld"], "NDe"=>["rg092827.vld"], "NDi"=>["rg092827.vld"], "NDu"=>["rg092721.vld", "rg092827.vld"], "NEc"=>["rg092827.vld", "rg092721.vld"], "NEd"=>["rg092721.vld", "rg092827.vld"], "NEu"=>["rg092827.vld"], "NEv"=>["rg092666.vld", "rg092721.vld"], "NFa"=>["rg092721.vld", "rg092827.vld"], "NFe"=>["rg092827.vld"], "NFl"=>["rg092721.vld"], "NGa"=>["rg092721.vld", "rg092827.vld"], "NGl"=>["rg092827.vld"], "NGo"=>["rg092827.vld"], "NGr"=>["rg092827.vld", "rg092721.vld"], "NHa"=>["rg092827.vld", "rg092721.vld"], "NHe"=>["rg092827.vld"], "NHi"=>["rg092827.vld", "rg092721.vld"], "NHo"=>["rg092827.vld", "rg092721.vld"], "NHu"=>["rg092721.vld"], "NIn"=>["rg092827.vld"], "NKe"=>["rg092827.vld"], "NKi"=>["rg092721.vld", "rg092827.vld"], "NKn"=>["rg092666.vld"], "NLa"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "NLe"=>["rg092827.vld"], "NLi"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "NLl"=>["rg092721.vld"], "NLo"=>["rg092721.vld", "rg092827.vld"], "NLy"=>["rg092721.vld"], "NMa"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "NMi"=>["rg092827.vld"], "NMo"=>["rg092721.vld", "rg092827.vld"], "NNe"=>["rg092827.vld", "rg092721.vld"], "NNo"=>["rg092721.vld"], "NOl"=>["rg092721.vld"], "NOr"=>["rg092666.vld", "rg092721.vld"], "NOv"=>["rg092827.vld"], "NPa"=>["rg092827.vld", "rg092721.vld"], "NPe"=>["rg092827.vld", "rg092721.vld"], "NPi"=>["rg092827.vld"], "NPl"=>["rg092827.vld"], "NPo"=>["rg092827.vld", "rg092721.vld"], "NPr"=>["rg092666.vld", "rg092721.vld", "rg092827.vld"], "NRa"=>["rg092827.vld", "rg092721.vld"], "NRi"=>["rg092827.vld"], "NRo"=>["rg092721.vld", "rg092827.vld"], "NRu"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "NSa"=>["rg092827.vld", "rg092721.vld"], "NSe"=>["rg092721.vld", "rg092666.vld"], "NSh"=>["rg092827.vld", "rg092721.vld"], "NSi"=>["rg092666.vld"], "NSk"=>["rg092827.vld", "rg092721.vld"], "NSo"=>["rg092721.vld", "rg092827.vld"], "NSt"=>["rg092827.vld", "rg092721.vld", "rg092666.vld"], "NSu"=>["rg092827.vld"], "NTh"=>["rg092721.vld"], "NTo"=>["rg092827.vld", "rg092721.vld"], "NTu"=>["rg092827.vld"], "NTw"=>["rg092827.vld"], "NTy"=>["rg092827.vld"], "NUl"=>["rg092827.vld", "rg092721.vld"], "NWa"=>["rg092827.vld", "rg092721.vld", "rg092666.vld"], "NWe"=>["rg092827.vld", "rg092721.vld"], "NWh"=>["rg092827.vld", "rg092721.vld"], "NWi"=>["rg092721.vld", "rg092827.vld"], "NWo"=>["rg092827.vld", "rg092721.vld"], "NWr"=>["rg092827.vld", "rg092721.vld"], "R-"=>["rg092721.vld"], "RBi"=>["rg092827.vld", "rg092721.vld"], "RBr"=>["rg092721.vld"], "RCr"=>["rg092827.vld"], "RCy"=>["rg092827.vld"], "RDa"=>["rg092721.vld"], "RDu"=>["rg092721.vld"], "RHo"=>["rg092721.vld"], "RNu"=>["rg092827.vld"], "RRu"=>["rg092827.vld"], "RSt"=>["rg092666.vld", "rg092721.vld"], "RSu"=>["rg092721.vld"], "RWo"=>["rg092721.vld"], "S-"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "SAl"=>["rg092666.vld", "rg092721.vld"], "SAm"=>["rg092721.vld"], "SAr"=>["rg092721.vld"], "SAs"=>["rg092721.vld"], "SBa"=>["rg092827.vld", "rg092721.vld"], "SBe"=>["rg092721.vld"], "SBi"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "SBo"=>["rg092721.vld"], "SBr"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "SCa"=>["rg092721.vld"], "SCh"=>["rg092666.vld", "rg092721.vld"], "SCl"=>["rg092827.vld"], "SCo"=>["rg092827.vld", "rg092721.vld"], "SCu"=>["rg092721.vld"], "SDa"=>["rg092827.vld", "rg092721.vld"], "SDe"=>["rg092721.vld", "rg092827.vld"], "SDo"=>["rg092721.vld"], "SDu"=>["rg092827.vld", "rg092721.vld"], "SEa"=>["rg092721.vld", "rg092666.vld"], "SEd"=>["rg092827.vld"], "SEg"=>["rg092827.vld"], "SEl"=>["rg092721.vld"], "SEt"=>["rg092827.vld"], "SFl"=>["rg092827.vld"], "SFr"=>["rg092721.vld"], "SGe"=>["rg092721.vld"], "SGl"=>["rg092721.vld"], "SGo"=>["rg092721.vld"], "SGr"=>["rg092827.vld"], "SHa"=>["rg092721.vld", "rg092827.vld"], "SHe"=>["rg092666.vld"], "SHi"=>["rg092827.vld"], "SHo"=>["rg092721.vld", "rg092827.vld"], "SHu"=>["rg092827.vld", "rg092721.vld", "rg092666.vld"], "SHy"=>["rg092827.vld", "rg092721.vld"], "SJo"=>["rg092827.vld"], "SKe"=>["rg092827.vld", "rg092721.vld", "rg092666.vld"], "SKi"=>["rg092827.vld", "rg092721.vld"], "SKn"=>["rg092666.vld", "rg092721.vld"], "SKu"=>["rg092721.vld"], "SLe"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "SLi"=>["rg092721.vld"], "SLu"=>["rg092721.vld"], "SMa"=>["rg092827.vld", "rg092721.vld"], "SMe"=>["rg092827.vld"], "SMi"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "SMo"=>["rg092721.vld"], "SNa"=>["rg092827.vld", "rg092721.vld"], "SNe"=>["rg092721.vld"], "SNo"=>["rg092721.vld"], "SPa"=>["rg092721.vld"], "SPe"=>["rg092827.vld", "rg092721.vld"], "SPu"=>["rg092721.vld"], "SRa"=>["rg092827.vld"], "SRi"=>["rg092827.vld"], "SRo"=>["rg092721.vld"], "SRu"=>["rg092721.vld"], "SSa"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "SSc"=>["rg092827.vld"], "SSe"=>["rg092721.vld", "rg092666.vld", "rg092827.vld"], "SSh"=>["rg092721.vld"], "SSm"=>["rg092721.vld"], "SSt"=>["rg092721.vld", "rg092827.vld", "rg092666.vld"], "STa"=>["rg092721.vld"], "STr"=>["rg092721.vld"], "SVi"=>["rg092721.vld"], "SWa"=>["rg092827.vld", "rg092721.vld"], "SWi"=>["rg092721.vld", "rg092827.vld"], "SWo"=>["rg092721.vld", "rg092666.vld"], "SWr"=>["rg092721.vld"], "SWy"=>["rg092666.vld"], "SYo"=>["rg092827.vld", "rg092666.vld"], "T-"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "TDu"=>["rg092827.vld", "rg092721.vld"], "TGl"=>["rg092721.vld"], "TGr"=>["rg092721.vld"], "THo"=>["rg092721.vld"], "TKi"=>["rg092721.vld"], "TMa"=>["rg092721.vld"], "TPa"=>["rg092721.vld"], "TRe"=>["rg092721.vld"], "TSu"=>["rg092827.vld"], "TUp"=>["rg092827.vld"], "TWi"=>["rg092721.vld"], "U-"=>["rg092721.vld"], "V-"=>["rg092827.vld"], "VBr"=>["rg092666.vld"], "VCo"=>["rg092721.vld"], "VPl"=>["rg092721.vld"], "VTo"=>["rg092721.vld"], "W-"=>["rg092721.vld"], "WPa"=>["rg092721.vld"], "X-"=>["rg092721.vld"], "XBr"=>["rg092721.vld"], "XCh"=>["rg092721.vld"], "XCu"=>["rg092666.vld"], "XGr"=>["rg092721.vld"], "XHa"=>["rg092721.vld"], "XLe"=>["rg092721.vld"], "XLo"=>["rg092827.vld", "rg092666.vld", "rg092721.vld"], "XMa"=>["rg092721.vld"], "XMi"=>["rg092721.vld"], "XPo"=>["rg092721.vld"], "Y-"=>["rg092666.vld", "rg092721.vld"], "YBa"=>["rg092721.vld"], "YBo"=>["rg092721.vld"], "YCh"=>["rg092721.vld"], "YCl"=>["rg092721.vld"], "YCo"=>["rg092721.vld"], "YDe"=>["rg092827.vld", "rg092721.vld"], "YDu"=>["rg092721.vld"], "YKD"=>["RG091946.VLD"], "YLl"=>["rg092721.vld"], "YLu"=>["rg092827.vld"], "YMa"=>["rg092827.vld", "rg092721.vld"], "YNe"=>["rg092721.vld"], "YSt"=>["rg092721.vld"], "YWi"=>["rg092666.vld"]}

FreecenBren commented 4 years ago

Kirk,

Before I can get any of them adjusted I have to determine the Census County. The COORDS do there own amended VLDs then send the amended copy to me for uploading at the 4 week slot. The ones that show up in the latest upload I would normally deal with, but this amount then I have to reallocate them to the COORD responsible for that County.

FC1 picks up some each month so some of these must go back a few years from the old upload system.

Is it possible that the 3 Digit County code can show on the error report please as that will save me having to look up everyone first. It is also hard as it is, being the list is not in HTML format so it shows the errors in columns. Anything to make the job easier, but also to ensure that I get them all back so can track them all easier. Sorry to be a pain.

Some of the COORDS as well might not have the VLD if they are old anyway, so I also have to send them that as well. I do have them but again they are filed by Census County.

Hope you understand that and can help me. Searching for the CHP would take extra time, as I would have to go on the the National Archives site to look them all up by census year. If that is not possible then a copy of the list in ‘table format’ would at least help to sort them by County. Looking at the list again it appears this list only relates to the 1861 Census so will presume we might have some more for the other Census years ???????

Captainkirkdawson commented 4 years ago

@FreecenBren @PatReynolds @richpomfret @DeniseColbert

I can provide lists in various formats and add additional info if it helps.

However rather than doing it all yourself would it not be useful to involve the new volunteers interested in working as data managers.? This is the type of issue they should get involved with

FreecenBren commented 4 years ago

They could be with training, but as I have no idea who they are, what experience they have, that would be a good starting point first.

Captainkirkdawson commented 4 years ago

@FreecenBren in answer to your question Census so will presume we might have some more for the other Census years???????

The answer is no this is the total list from birth county checking. But there might be others for verbatim birth counties

Captainkirkdawson commented 4 years ago

Specific information on the files in question

rg122720.vld DBY 1891 Shardlow 2720 HS410879.VLD KKD 1841 RERRICK 879 RG093137.VLD LAN 1861 Preston 3137 RG103297.VLD LEI 1871 Waltham 3297 RG122590.VLD LIN 1891 Lincoln 2590 RG122641.VLD NTT 1891 East Retford 2641 rg092721.vld LAN 1861 Everton 2721 rg092827.vld LAN 1861 Bolton Eastern 2827 rg092666.vld LAN 1861 Dale Street 2666 rg103710.vld CHS 1871 Wybunbury 1C 3710 HS410707.vld ELN 1841 Garvald 707 RG091946.VLD STS 1861 Leek 1946

Captainkirkdawson commented 4 years ago

Identical results for Verbatim Birth Counties

FreecenBren commented 4 years ago

Hi Kirk, This is the format that FreeCEN One uses and in contains all the info we need that is failing as errors. You will see that it matches yours for 4 of them.

This was last weeks FC1 errors after Vino did the upload. So these are being amended now ready for the next upload in 3 weeks. So will add the other 8 to this list.

https://freecen1.freecen.org.uk/errors.html

CountyFileMessage GLS HO511972.VLD Invalid County LDN (ED: 4b Schedule: 119 Record: 7, GIFFORD, Josephine) STS RG091946.VLD Invalid County YKD (ED: 9 Schedule: 71 Record: 2, WATERS, Keziah) CHS rg103710.vld Invalid County LDN (ED: 24 Schedule: 36 Record: 3, AUSTIN, Alice J) CHS rg103710.vld Invalid County LDN (ED: 24 Schedule: 36 Record: 4, AUSTIN, Caroline) NFK RG091228.VLD ignored - duplicate NFK RG091251.VLD ignored - duplicate

Cheers Brenda

DeniseColbert commented 4 years ago

@FreecenBren I'm going to make a mailing list/group early next week and add you in to introduce you etc. I can say right now that the new volunteers are not existing transcribers and have varying levels of familiarity with FreeCEN1/2

Captainkirkdawson commented 4 years ago

@FreecenBren

I suspect that the errors with the files I have identified are far more serious than the 1 or 2 records identified by CEN1. They need to be reviewed in detail.

PatReynolds commented 4 years ago

@PatReynolds to ask @Brenda about progress / what needs to be done

Captainkirkdawson commented 4 years ago

I have rerun my analysis and the following files still contain errors.

rg122720.vld DBY 1891 Shardlow 2720 HS410879.VLD KKD 1841 RERRICK 879 RG093137.VLD LAN 1861 Preston 3137 RG103297.VLD LEI 1871 Waltham 3297 RG122590.VLD LIN 1891 Lincoln 2590 RG122641.VLD NTT 1891 East Retford 2641 rg092721.vld LAN 1861 Everton 2721 rg092827.vld LAN 1861 Bolton Eastern 2827 rg092666.vld LAN 1861 Dale Street 2666 rg103710.vld CHS 1871 Wybunbury 1C 3710 HS410707.vld ELN 1841 Garvald 707 RG091946.VLD STS 1861 Leek 1946

This is exactly the same list I posted on March 20. There is no improvement

FreecenBren commented 4 years ago

Hi Kirk, A few of those have been done and are waiting on the FreeCEN upload. Lemon is doing it this week. I will then look at the new error report and deal with them myself apart from the Scotland ones that I will send to Margaret. Those on your list that are not on the FC1 I will get done as well.

I have been so busy with the ‘Newbies’ time has been the issue. Priority has been getting them set up first. Brenda

Captainkirkdawson commented 4 years ago

I have rerun my analysis after the May update and the following files still contain errors.

rg122720.vld DBY 1891 Shardlow 2720 HS410879.VLD KKD 1841 RERRICK 879 RG093137.VLD LAN 1861 Preston 3137 RG103297.VLD LEI 1871 Waltham 3297 RG122590.VLD LIN 1891 Lincoln 2590 rg092721.vld LAN 1861 Everton 2721 rg092827.vld LAN 1861 Bolton Eastern 2827 rg092666.vld LAN 1861 Dale Street 2666 rg103710.vld CHS 1871 Wybunbury 1C 3710 HS410707.vld ELN 1841 Garvald 707 RG091946.VLD STS 1861 Leek 1946

There is a reduction of 1 file from the March list (RG122641.VLD NTT 1891 East Retford 2641)

PatReynolds commented 4 years ago

@PatReynolds contacted Brenda 28/5/2020

PatReynolds commented 4 years ago

Emailed Brenda 9/6/2020

PatReynolds commented 4 years ago

@PatReynolds to talk to @geoffj-FUG about using the CSV conversion tool he is working on, and then upload using the new tools - will be a good test of the tool, and a good test of the new tools. As well as cleaning the data.

PatReynolds commented 4 years ago

Emailed Geoff 7/7/2020

PatReynolds commented 4 years ago

Pat to choose one and use the tool to upload as a way of finding the issue.

PatReynolds commented 3 years ago

Asked Brenda for smallest available file 24/7/2020

geoffj-FUG commented 3 years ago

Kirk Is there any way of knowing which invalid Chapman Code(s) belong to each of these files please? Geoff rg122720.vld DBY 1891 Shardlow 2720 HS410879.VLD KKD 1841 RERRICK 879 RG093137.VLD LAN 1861 Preston 3137 RG103297.VLD LEI 1871 Waltham 3297 RG122590.VLD LIN 1891 Lincoln 2590 rg092721.vld LAN 1861 Everton 2721 rg092827.vld LAN 1861 Bolton Eastern 2827 rg092666.vld LAN 1861 Dale Street 2666 rg103710.vld CHS 1871 Wybunbury 1C 3710 HS410707.vld ELN 1841 Garvald 707 RG091946.VLD STS 1861 Leek 1946

Captainkirkdawson commented 3 years ago

@geoffj-FUG See my comment of Mar 20 for the linkage. The problem is apparently caused by there characters being dropped in a record. Since these are fixed length records VLD if you drop characters every field gets mixed up If you look at my post of Feb 20 it appears as if 2 characters are dropped prior to the Surname field.

geoffj-FUG commented 3 years ago

Kirk

I spent about 3 hours last night trying to find the error by visually examining every line in the spreadsheet version of a vld file (converted from vld).

I copied and pasted the records for the parish concerned into a FreeCEN2 traditional spreadsheet and uploaded it and there was no error reported. To me that means that the problem has been cleaned out? (That is a presumption).

If that is so and I convert the spreadsheet back into a vld file then the new vld file will be clean? The fields should once again be the correct fixed length. Another presumption!

If that new vld file is loaded to FreeCEN1 and my presumptions are correct then the problem should disappear.

What are your thoughts on that?

I will have access to my computer with fctools on it tomorrow evening. If you agree with my presumptions I can do it then.

Geoff

Captainkirkdawson commented 3 years ago

@geoffj-FUG difficult to comment. Which specific file were you looking at? The problem is something seen with the vld files loaded into CEN. The issue was detected when I did a quality check on the chapman field and found those errors. The problem is caused by an offset of one or more character in a couple of lines. These files have "moved" around the internet several times by the time they arrive at CEN ie. coordinator to Brenda via internet to dropbox by internet to cen1 server from dropbox, from cen1 server to CEN server by internal channels. Bit drops are always possible. Is the problem apparent on cen I do not know. If you have a clean file then send it through the next upload cycle and lets see. In my mind these would be great candidates for early conversion and upload via CSVProc.

Captainkirkdawson commented 3 years ago

@FreecenBren I must emphasize that these are NOT issues with the new CSVProc version of CEN2. Perhaps we should formalize our terminology. There are 3 CEN applications; 2 of which are in production and 1 in development (hopefully to go live at the end of this month). My terminology from here on will be:CEN1 (the old app developed by Dave Mayall) using offline data and file tools with monthly updates; CEN2 the reimplementation of CEN1 in Ruby on Rails(developed by Ben and Doug); CEN2 uses the same file formats and processes as CEN1 but with a new database structure and engineCSVProc that provides modern tools and online data and file management application interfaced with the CEN2 database structure and search engine that can be updated at any time. CSVProc is designed to coexist with CEN2 The problem files that I identified earlier this year were identified by looking at the data content of the birth Chapman code to ensure that all records had valid Chapman codes in that one field.. Many records for those files were invalid and irretrievable on CEN2.  I believe but have no firm proof that the same situation exists for those files on CEN1.The initial set of 12 problem files was rg122720.vld DBY 1891 Shardlow 2720 HS410879.VLD KKD 1841 RERRICK 879 RG093137.VLD LAN 1861 Preston 3137 RG103297.VLD LEI 1871 Waltham 3297 RG122590.VLD LIN 1891 Lincoln 2590 RG122641.VLD NTT 1891 East Retford 2641 rg092721.vld LAN 1861 Everton 2721 rg092827.vld LAN 1861 Bolton Eastern 2827 rg092666.vld LAN 1861 Dale Street 2666 rg103710.vld CHS 1871 Wybunbury 1C 3710 HS410707.vld ELN 1841 Garvald 707 RG091946.VLD STS 1861 Leek 1946 @FreecenBren found problems in CEN1 for 1 of these RG122641.VLD NTT 1891 East Retford 2641 was corrected in May. The remaining 11 still had problems. What is clear is that the problems are not immediately obvious and MAY be different from file to file

The correction of these files is in no way related to the development and implementation of CSVProc. It is a matter of pure data quality in all 3 versions. 

Captainkirkdawson commented 3 years ago

@geoffj-FUG I have spent today digging into these files and have sent you a zip file with the problem files; some of which are very old. The following is the situation as of today problem files.JPG

Captainkirkdawson commented 3 years ago

@geoffj-FUG @FreecenBren The files with the null entries have records with null in almost all fields an example for rg122720.vld is

"surname" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "forenames" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "occupation" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "occupation_flag" : "\u0000", "name_flag" : "\u0000", "relationship" : "\u0000\u0000\u0000\u0000\u0000\u0000", "marital_status" : "\u0000", "sex" : "\u0000", "age" : "\u0000\u0000\u0000", "age_unit" : "\u0000", "detail_flag" : "\u0000", "civil_parish" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "ecclesiastical_parish" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "dwelling_number" : 0, "sequence_in_household" : 0, "enumeration_district" : "\u0000\u0000\u0000\u0000", "schedule_number" : "\u0000\u0000\u0000\u0000", "folio_number" : "\u0000\u0000\u0000\u0000\u0000", "page_number" : 0, "house_number" : "\u0000\u0000\u0000\u0000", "house_or_street_name" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "uninhabited_flag" : "\u0000", "unoccupied_notes" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "individual_flag" : "\u0000", "birth_county" : "\u0000\u0000\u0000", "birth_place" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "verbatim_birth_county" : "\u0000\u0000\u0000", "verbatim_birth_place" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "birth_place_flag" : "\u0000", "disability" : "\u0000\u0000\u0000\u0000\u0000\u0000", "language" : "\u0000", "notes" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000",

There are 5246 entries in the file

Captainkirkdawson commented 3 years ago

@geoffj-FUG @FreecenBren An example for rg092666.vld with birth county "BJa"

"_id" : ObjectId("5c87ae79f4040bce77ab514d"), "surname" : "ARDHOUSE Ma", "forenames" : "ria -B", "occupation" : "irt Maker -O", "occupation_flag" : "V", "name_flag" : "o", "relationship" : "ardrMF", "marital_status" : "2", "sex" : "8", "age" : " y-", "age_unit" : "S", "detail_flag" : "h", "civil_parish" : "verpool 6", "ecclesiastical_parish" : " Stephen", "dwelling_number" : 79200, "sequence_in_household" : 0, "enumeration_district" : " f79", "schedule_number" : " 4", "folio_number" : " 10", "page_number" : 49, "house_number" : " B", "house_or_street_name" : "spham St -G", "uninhabited_flag" : "U", "unoccupied_notes" : " St", "individual_flag" : "U", "birth_county" : "BJa", "birth_place" : "maica -", "verbatim_birth_county" : "BJa", "verbatim_birth_place" : "maica -", "notes" : " St", "freecen1_vld_file_id" : ObjectId("5c87ae77f4040bce77ab43ad")

There are 1069 dwellings in the file and

geoffj-FUG commented 3 years ago

Brenda

I have recreated the csv file from the RG122720.vld file and have a host of problems.

Firstly, the file has truncated half way through the address on row 2632. Everything after that (2753 records) had the family number of 0 and family member number of 0 again.

I discovered that there are a large number of records that have not uploaded (or were not transcribed).

I tried to convert the file to the 1891 format for CSVproc but there was no data for the occupation category so I added a blank column. After a bit of messing around I managed to get an 1891 file and uploaded it to CSVProc.

Enlightenment – the occupation category was concatenated to the occupation. The file is a download from FreeCEN.

That makes life difficult. So, Brenda, do you have a copy of the vld file that was originally uploaded please?

I checked the previous piece (RG103297) against Ancestry and the spreadsheet we now have is the end of an ED. It finished at Folio 64 so I hunted for folio 72 and got no result. My assumption was that I had a complete spreadsheet.

Now that I have another file with similar symptoms I am wondering whether there was a small ED or institution on the end of the piece. If so we have had something that is corrupting the vld file and causing records to be loaded with no data, hence the 0 family number and 0 family member number. That results in Kirk’s discovery of null fields.

I really need to be able to backtrack on this piece. Hence the need for the original uploaded vld file. It may be that we have located the problem.

I will now have a look at another piece and see if the problem recurs. One is forgivable, 2 is a coincidence but 3 will be ???

Geoff

geoffj-FUG commented 3 years ago

Got it!

In the row that truncated and became the last row in the piece I have the following (my emphasis in red):

\fctools\fctwkg\RG122720.VLD

[When reporting program errors, be sure to send this file and the original data file]

Census year = 1891, input format = VALD-REV, output format = ALL, VERBOSE

Info: record 2631 non printable characters

Info: record 2631, Building/Unoccupied flag = < >

Info: record 2631, surname missing

Info: record 2631, forename(s) missing

So we have non printable characters in the file, hence we cannot see them!

It just needed the full format report. I never use that one.

We need to rebuild this from the original vld upload as it should have all the records. Retype row 2631 and the delete the existing row 2631.

That should clean it.

We can then rebuild the vld file and reload it.

Touch wood!

Geoff

FreecenBren commented 3 years ago

Ok Geoff leave them with me.

I will do them and then ask Lemon to do an amended upload extra for the amended VLDs. He will have to fit it in during a break next week as it is not FreeCENS week. He has done it before.

Cheers Brenda

geoffj-FUG commented 3 years ago

Brenda

I have attached RG092666

I cannot find anything in it. Tests OK.

I have recompiled the vld file.

It is attached.

I will leave the rest to you.

Geoff

Captainkirkdawson commented 3 years ago

@geoffj-FUG The files sent to you off line are copies of the ORIGINAL files used to create the CEN1 database. ie they are what @FreecenBren or whomever sent to get processed on the date noted in the report 7 comments above They are not downloaded files.

Captainkirkdawson commented 3 years ago

A reload of rg092827.vld LAN 1861 Bolton Eastern 2827 continue to exhibit the sane problem still there.png

FreecenBren commented 3 years ago

I am working on this Lan piece again for Neil, as it failed again on Vino ‘s report.

On FC1 and FC2

Brenda

On Mon, 23 Nov 2020 at 22:29, Kirk Dawson notifications@github.com wrote:

A reload of rg092827.vld LAN 1861 Bolton Eastern 2827 continue to exhibit the sane problem [image: still there.png] https://camo.githubusercontent.com/d2c8d8b84b34264930425c10249273eb4864ae98e572279c0ce26de8389482da/68747470733a2f2f696d616765732e7a656e68756275736572636f6e74656e742e636f6d2f3562363861653638346235383036626332626465666366302f64376465386265322d666436322d343237302d393766312d323539356535313363633034

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FreeUKGen/FreeCENMigration/issues/803#issuecomment-732463123, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIL3VQYZ6FKP7B7U2WFVJLSRLO6JANCNFSM4KQWRXSQ .

PatReynolds commented 3 years ago

@Captainkirkdawson to see if this was fixed in the recent upload.

Captainkirkdawson commented 3 years ago

Afraid to have to say nothing has been fixed The file rg092827.vld fails with an error notification The files with blank entries are still there. In total there are 8848 entries with nul chapman codes (the entries look to be blank lines

Captainkirkdawson commented 3 years ago

Neil challenged me to fix this problem. I was initially quite reluctant as the whole VLD processing system has not been a part of the on-line system and therefore not been part of my responsibilities or knowledge base.

However it is clear that there are major frustrations with the current situation that could warrant a deeper review.to see if there is a workable solution.

Firstly let me emphasize that the files being exchanged are encoded as UTF-8 as that is the file encoding used with all our servers be they supporting FC1 or FC2. Both applications run on the same servers with the same file structures...

Secondly VLD files are created on many different platforms using stand alone software developed many years ago (10-15 years I believe). Back then many of our personal devices were not using UTF-8 encoding. It may well have used ISO-8859-1 or something else in writing the file; again I do not know.. A VLD file is a fixed length file and fixed length records for an entry. That record length is 299 characters or bytes. Hence in converting the record into meaningful fields the file is read byte by byte.and each set of bytes converted into a specific field. Normally this works just fine. Except when the decoder encounters a byte that is an illegal character. The resulting behavior with the UTF-8 decoder, is to replace the illegal character with the UTF-8 replacement character � or 0xEF 0xBF 0xBD which is 3 bytes. i.e. the decoder is adding 2 characters. This results in all subsequent charcters being offset by 2 and the fields are now incorrectly interpreted which is why we see the strange Chapman codes etc

Thirdly the three characters - 0xEF 0xBF 0xBD appear as � if decodes or as the diamond character � if displayed directly before decoding.

Fourthly. Is there a bypass?. Possibly. If we tell the decoder to decode as ISO-8859-1 then it does appear the files are processed successfully; if you believe the display in the results is success. See attachment character

Lastly I have processed 5 LAN files that had issues and 1 from WRY that had all reported data problems without issue other than the results will have strange characters but that is no different to FC1 display of those records. See test results

Conclusion. If you are satisfied with this (it is no better or worse than FC1) we can put it into production in time for the next update. Now there may be unintended consequences but there always are. Character.png test pieces.png