FreeUKGen / FreeCENMigration

Issue tracking for project migrating FreeCEN to FreeCEN2 genealogy record database and search engine architecture. Code developed here is based on that developed in MyopicVicar
https://www.freecen.org.uk
Apache License 2.0
4 stars 3 forks source link

unoccupied_notes #839

Closed Captainkirkdawson closed 4 years ago

Captainkirkdawson commented 4 years ago

@geoffj-FUG @rhodamackenzie The vld entry has a field called unoccupied_notes. This does not appear in the csv file. What is it? Is it needed?

rhodamackenzie commented 4 years ago

again, sorry i don't know this, but where would i see where the vld entry is?

AlOneill commented 4 years ago

A building may be designated as 'unoccupied' on the census return, but still have potentially useful information recorded against it. Can't come up with much on the spot, but something like, "Dawson family moved on 2 mo ago".

Transcribers are encouraged to put such information in the Notes field: I guess that these notes are transferred to the unoccupied_notes field at some point in the process, but I don't know when or how.

FreecenBren commented 4 years ago

In many cases the unoccupied is entered as it is a shop etc., and no one lives there at night when the shop closes. These can be butchers, grocery, green grocers, shoe shops, blacksmiths and schools etc. Similar is the use of the V. No one living there on the night as they are visiting elsewhere.

rhodamackenzie commented 4 years ago

Prob best to have notes separately for unoccupied premises rather than lumping it into another section, if that is what it means in the csv

FreecenBren commented 4 years ago

Quite a lot of unoccupied premises also include the address and the type of shop. Anything else given is added to the Notes column (column Y in the CSV) if it will not fit in the address column. Whatever is added carries through to the Census.

geoffj-FUG commented 4 years ago

Kirk Back to basics - Let’s all take a step back and do this logically. We are second guessing because we don’t have all the information. If you can send me a copy of the data dictionary then I will match up the .vld file fields with that. There may be fields there that were included 20 years ago but as things developed were never needed. Other fields were used for the information instead. So by matching up we can eliminate any obsolete fields in the data dictionary.

That will give us a complete picture. Geoff

Captainkirkdawson commented 4 years ago

@geoffj-FUG

Ask and ye shall receive. https://drive.google.com/open?id=1zVtBOQ7Taqzu6j8_mMj3zTXgl0aVMOka is the excel version https://drive.google.com/open?id=1pg6EWNhnGUJZ9GYzcVo9AoIrIlwK8m-b is a pdf version

The first column are the fields for the csv_entry. The second has either Mine or 1901. Those marked Mine are needed by my processor. The third is the spec of the VLD entry as known in CEN2. I also included the Individual and Dwelling collection fields. These 2 are derived from the VLD entry.

The only field really left to sort out is the detail.flag

I have also copied a link to the SSCENS layout sheets for the data managers and others https://drive.google.com/open?id=1yx-HkNHuOSiV9Q7rTVG5QcJrgtl0wCmL

Captainkirkdawson commented 4 years ago

Going back to the unoccupied_notes field. There are 823,419 entries in the field. This is slightly MORE than in the actual notes field which has 823,041 Nowhere is its displayed to the researcher Its content is amazingly varied! A sample "v consul of n german confederation", "v faded page", "v faint but bap Carhampton 10 1 1836", "v faint but bap Carhampton 6 1 1833", "vacant", "vacant from 44 to 60", "vacant land", "vacant land for building 7 hses", "vacant land from 34 to 40", "vacant land in the street for building", "vacant land off selhurst st", "vacant on census night occupied during week", "vacated on 31 Mar", "vacccination officer", "vaccination & school attendance officer", "vacination officer, collector to guardians", "vagrancy untried", "vagrant", "vagrant in diff handwriting", "vagrant inmate", "vagrant, in the barn", "vagrant. Where born N.K.", "vague details", "valets son", "valets wife", "valnord road", "valpaireso chile", "valuer", "valuer & builder", "valuer & surveyor", "valuer Commissioner of labr & farmer 530ac", "valuer and agent. Wes local preacher", "van demons land", "van diemen's land", "van diemen's land (nz)", "vancover british Columbia", "var. surnames in family", "variant of schedule 14", "variant of schedule 38", "variant of schedule 50", "variant of schedule 91", "variation of Belchnoic? Ballcorach?", "variations in S/name see original", "varicose veins", "various diseases", "various employment", "various place possibilities", "various scribbles in occ. ?birthplace", "varnish maker emp 4m", "vaynor",

The following is a sample from the notes field "xxxxx practiceing as a profession", "y", "yacht & boatbulder", "yacht & house", "yacht agent", "yachtsman", "yarburgh", "yard and 81 high st same place", "yard between brewery hill & commercial rd", "yard craft h m y", "yard no dwelling", "yard porter", "yard to let", "yards office & goods warehouse", "yarn & filler", "yarn counter", "yarn miller employing 6 men 4 women", "yarn properly dried", "yarn winder, flax", "yate and pickup bank", "yate pickup bank Blackburn", "yatton keynell", "yawl/3t/pilot boat", "yeast dealer", "yeomanry", "yeovil", "yes recorded for ARL but E in second column", "yews wadsby", "yks alboro and dur darlington crossed out", "yks barmby", "yks cheapsides", "yks or nth", "yn Mochnant", "york tanning & currying company", "young farmer", "young ladies school", "young not named", "young womens christian association", "younger john called george in 1851", "ys", "yspytty ystwyth", "ystradvelltey", "zanesville", "zephaniah in FreeBMD", "zion chapel", "zion chapel, skipton", "zoological society london", "{asst district secretary", "|Age given as 1 yr 9 months", "|Employes 2 men, 3 women, 1 boy", "|Single unreadable letter in Relationship",

geoffj-FUG commented 4 years ago

Thanks Kirk

geoffj-FUG commented 4 years ago

Kirk

This is the information entered into column Y on the spreadsheet. It should be reported in the Notes field of the screen report (to the right of disability). It is absolutely critical information for the researcher.

It does not appear on the results screen but should appear when the family are viewed.

Geoff

FreecenBren commented 4 years ago

Hi,

We have an issue.

I have checked FC2 and yes you are correct the notes in column Y for the unoccupied, which contain information for the researcher are not showing in FC2.

However, they are showing in FC1.

Example below . It actually shows at the bottom of the unoccupied record.

I used John Perryman in my search, then I looked at the previous household. 1861 Census 1 Records found

Piece: RG9/1457 Place: Buckland-Mnchrum -Devon Enumeration District: 2f Civil Parish: Beer Ferris Ecclesiastical Parish: Beer Ferris Folio: 34 Page: 22 Schedule: 0 Address: -


Unoccupied Not collected, the family moved

This is what the FIELDS document says.

NOTE: where b, u, v, or n have been entered in Column H, any data entered in Columns I to X will be rejected. A comment may be entered in Column Y if there is a Query regarding the schedule or address entries

NOTE: where b, u, v, or n have been entered in Column H, any data entered in Columns I to X will be rejected. A comment may be entered in Column Y if there is a Query regarding the schedule or address entries

I looked last night and could see on the ones I found that they were missing in FC2

After sleeping on it I did the very same search in FC1. Bingo.

Shows what a little sleep can do for you.

Brenda

geoffj-FUG commented 4 years ago

Brenda

We need to log it.

How do we create a action needed advice?

Geoff

geoffj-FUG commented 4 years ago

Brenda

I have just been going through the data dictionary and matching it to the various spreadsheet formats.

I had an unused field called unoccupied_notes in my list of leftovers.

I matched a field called notes to the Comments in the spreadsheet.

So is that the problem with the screen report. The wrong field has been used in the report form?

Just flying a kite! But if I did it someone else could have done it as well.

Geoff

Captainkirkdawson commented 4 years ago

As noted earlier the vld_entries has 2 fields Notes and Unoccupied Notes. Looking at records containing an x in the uninhabited field there is an entry in both the notes and unoccupied_notes field for the first occupant of the dwelling; In the example I have found all members of the household had x in the uninhabited field which seems odd.

There are 16,015 entries with x and a note

Many of the such entries have a note that is nothing to do with the address "_id" : ObjectId("5902795ae9379091b116767c"), "surname" : "CHILL", "forenames" : "Hugh", "occupation_flag" : "-", "name_flag" : "-", "relationship" : "Visitr", "marital_status" : "U", "sex" : "M", "age" : "18", "age_unit" : "y", "detail_flag" : "-", "civil_parish" : "Ardrossan", "ecclesiastical_parish" : "Ardrossan", "dwelling_number" : 334, "sequence_in_household" : 4, "enumeration_district" : "5", "schedule_number" : "0", "folio_number" : "0", "page_number" : 21, "house_or_street_name" : "Drakenyre Street", "uninhabited_flag" : "x", "unoccupied_notes" : "Where born not enumerated", "individual_flag" : "x", "birth_county" : "UNK", "birth_place" : "-", "verbatim_birth_county" : "UNK", "verbatim_birth_place" : "-", "birth_place_flag" : "x", "notes" : "Where born not enumerated", "freecen1_vld_file_id" : ObjectId("5902795ae9379091b1167051") It would appear that the unoccupied_notes possibly reflect a concatenation of both notes fields

Captainkirkdawson commented 4 years ago

Record correctly reported in FC2 chill.png

Captainkirkdawson commented 4 years ago

The issue noted by Brenda come from the fact that the notes displayed in FC2 are associated with the individual. If unoccupied there is no individual and no field in the display for a dwelling even though there is a field in the dwelling A definite bug.

FreecenBren commented 4 years ago

Officially all the ones marked with a query X should have been dealt with by the Proof Reader or Validator. As you have found not all have been removed and/or dealt with.

The notes are added in by the Transcriber when they do not know what to do in the situations they find and add the X for the PROOF READER to pick up. In exceptionally cases the PROOF READER does not know what to do neither so leaves it for the Validator. The X normally relates to the person on the row it is on, unless it could be the spelling of the Surname in which case they should only add a query X for the first person in the dwelling that the spelling refers to. Sometimes it could also be to do with the spelling of the address. If the age is not given then 999 has to be used. I have found that some transcribers also add a query X as well but they should be removed as the 999 does the job. The list can go on and on as there are 5 columns to enter a ‘query x’ in.

Could FC2 pick up these as a query as to either delete the X but leave the comment or delete both?

It looks like from your list a lot have been missed but with 38 million + records and a time span of 20 years, some of them may well be back to the dark ages of Incens etc., before the current software was about.

Captainkirkdawson commented 4 years ago

@FreecenBren wrt to your comment Could FC2 pick up these as a query as to either delete the X but leave the comment or delete both? The new CSVProc system for Cen2 does that. There is a display of all flags set in the file Flags index

geoffj-FUG commented 4 years ago

When I started with FreeCEN (in 2004) the Notes column in the Spreadsheet had limited use. Really it was somewhere to put a message when a flag was set, so that the Proofreader and Validator knew what the flag was about.

Then we started to get problems of not being able to get all the information in the space available so the Notes got used to add any additional information or to overflow long information such as Occupations.

Now we have some Transcribers and Proofreaders who research hard to read entries and need to put the results of their research somewhere. The only place available is Notes.

With all this information appearing in the Notes it was quite rightly allowed to go through to FC1 and FC2 to help the Researcher.

The point is that the use of the Notes column has evolved over time.

I have never experienced SSCens as it was obsolete when I started. I have only experienced Spreadsheets. So I cannot tell you what the field was originally used for. Fields.htm refers to SSCens and states that

This cell is provided to qualify any query raised and is intended as a brief indication of the nature of the problem

So it was never intended for the purpose that it is use for today.

Given the above it seems logical that the field was called unoccupied_notes because that was probably how the programmer saw it. Today the field name needs to be changed because the purpose has changed.

Along the way someone must have decided this and created the fields Notes. (That is possibly the field that is used in the FreeCEN report because of the naming. But it is empty?).

So we have a field that has changed its purpose and the change is causing confusion.

Geoff

Captainkirkdawson commented 4 years ago

Having spent some time digging into the CEN2 code it has become clear that there is only 1 notes field in the incoming vld entry. It is the original notes field from the spreadsheet with adjustments made to it by the Proof Reader and the Validator as they have reviewed the transcription. The original programmer for CEN2 created the unoccupied_notes field as a copy of notes field. The notes field is subsequently set to an empty field if it contains the text string [see mynotes.txt].

The unprocessed unoccupied_notes field is used in the display of an uninhabited dwelling.

The processed notes field (i.e. the original field UNLESS it contained [see mynotes.txt]) is then used in the display for the individual.

This accounts for why there are 823,419 entries in the unoccupied_notes field and slightly fewer in the actual notes field which has 823,041. (i.e there are 378 entries where the notes contain the string [see mynotes.txt]

FreecenBren commented 4 years ago

The “ see [ mynotes.txt]. were added by the transcriber instead of using the Notes column they left a longer message in the additional “Note pad’ .

At WINCC of VALDREV these ‘My [see mynotes.txt]. wording should have been deleted after the Proof Reader or Validator had finished the piece.

I have a few examples of these and I also have a transcriber who still sends me one for each Piece he transcribes. These were started when we used ’Icens’ to leave notes for the Checker and Validator. Many of the transcribers carried on the procedure. So they left [ see mynotes.txt] in the notes column instead to remind the Checker/Validator to look at what they had written in Note Pad and decide if any further action was needed. I still have some as well. They are on my PC. They make interesting reading. The note though should be deleted.

Captainkirkdawson commented 4 years ago

@FreecenBren I can see why they are deleted. Fail to see why they were retained for the unoccupied notes. Unless of course they would never have been entered with an uninhabited dwelling At least we have cleared up the mystery