Closed Captainkirkdawson closed 4 years ago
Pushed the changes to origin
For testing, @Vino, does this need to be uploading unmodified PARMS files which were created by @richardofsussex ?
@Vino-S Does the upload process the file to create the individual pieces?
Is it possible to upload a csv version of the file?
If the latter is there a conversion of the csv parms file to the pieces.
We need the csv version to overcome the length limitation on Civil District and Enumeration district names.
We should move the upload into the Manage County set of actions and have it as a county coordinator capability.
Can you base further code modification on cen_820
Hi @Captainkirkdawson
fixed
. I will work on adding the Process ActionThanks @Vino-S PS please build on cen_820 it has your code merged in
@Vino-S before you go too far can I draw your attention to the freecen_parms collection. Cen2 has controller, models and views for the collection written years ago by Doug for uploading csv parms files and their processing. Stumbled across it while looking for something else.
Ah yes Kirk.
_We need the csv version to overcome the length limitation on Civil District and Enumeration district name_s. @richardofsussex already has some
Hi @Captainkirkdawson , I have made the following changes in the code. I have kept Doug's code as it is and used it as reference.
I have moved the Manage Parms
Action to the Manage County
action
I have changed the Destination for uploaded parm files to MyopicVicar/parms_files
At Present Manage Parms action takes us to index page which will show list of files uploaded through FC2
I have created an action to Add new Parm File
to upload a Parm file.
Upload process does the following validations:
a) validates file type to be csv/dat
b) Validates file name
c) validates file header
d) validates file content
Removed length limitations on Civil District and Enumeration district names
There was a recent discussion about 'bundling' the lower-level place concepts. Looking at the 1901 census data from TNA, that might be the only option. http://discovery.nationalarchives.gov.uk/results/r?_q=RG+13%2F+1892 shows that places are run together without any punctuation. Comments/advice please.
Hi Richard,
Yes we have always had that issue when we do a new Parms we did the following.
We were lucky as we just click on the ’ Purple Registration’ and it takes us to the next page where we then find the list of Places.
Registration Sub-District: Wiggenhall Civil Parish, Township or Place: Barroway Drove Holme next Runcton North Delph (part) Porter's Fen Corner Saddle Bow Shouldham Shouldham Thorpe South Runcton Stow Bardolph (part) Stowbridge Tottenhill Wallington with Thorpland Watlington Wiggenhall St German Wiggenhall St Mary Magdalen Wiggenhall St Mary the Virgin Wiggenhall St Peter Wimbotsham Wormegay
Brenda,
When I access the same record via TNA's search API I get this:
For comparison, the 1881 census has a 'heading' for each place:
Not sure how to proceed: it needs to be done programmatically. (In 1911 the number of records explodes from around 5,000 to 35,000!)
Richard
Sorry: pasting partial screenshots into my email clearly didn't work. What I get in the XML from the API for 1901 is:`
I was able to use the 'headings' within the text to split up the places sensibly: that is no longer possible.
Richard `
@richardofsussex the bundle being talked about were additional fields which were never previously transcribed but likely should have been. They appeared on the cover sheet and the page heading. We plan to add them as part of #820 augmentation of fields and #859. In your example for RG 13/1892 it is a piece with in the Registration District of Downham and it contains the specific civil parishes in the list following the second colon. You are correct that in previous years the bundle was associated with each civil parish. Hence as you say you could use that for separation. Can you explore what separates the civil parishes in you extract? There must be something there as the pages displays with civil parish separation in TNA
@richardofsussex we do have a complete csv set for Somerset for 1901. Now I don't know if that was based on extract you made or where it came from. Perhaps @FreecenBren or @geoffj-FUG can comment
Looking at the file in my hex editor - they are just spaces. Whatever separator was present in the source data has been stripped or converted to space in the generation of the XML. I'll see if TNA are willing/able to help; for example the TSV format they produced before may be more useful.
I have no recall (or record) of working with 1901 before. All my files from the 2017 work are still sitting around.
@Vino-S as noted on slack I received a crash on the initial upload test on my server
Going back to first principles here. Could someone please remind me what counts as an 'a' record and what is a 'b' record in a PARMS file? Looking at the document Brenda prepared back in 2017, I conclude that we DON'T include Registration Sub-Districts or places smaller than a Parish. Is that correct?
@richardofsussex a is the Registration District and b is a civil parish ie the name after the second colon. In years up to 1891 between the colons was Civil Parish, Township or Place ; in 1901 and on it has a name before the Civil Parish, Township or Place but we want what follows
To take an example mentioned in Brenda's "parms example" spreadsheet:
<Description>Registration Sub-District: Tuxford Civil Parish. Township or Place: Askham Bevercotes Bothamsall Church Laneham Darlton Dunham East Drayton East Markham Fledborough Haughton High and Low Marnham Laneham Markham Clinton Marnham Milton Normanton on Trent Ragnall Stokeham Tuxford West Drayton.</Description>
'Civil Parish' looks as though it is part of the Sub-District name. Am I correct in thinking that there ought to be a full stop (or something) after 'Tuxford', and that the full stop after 'Civil Parish' ought to be a comma? (At least they have done this consistently!)
Of the long list of names that follows, Askham, Beverco[a]tes, Laneham, Darlton, Dunham, etc. are all parishes, but some (e.g. Bothamstall Church) are not. I can see no way of determining which to include, nor indeed of deciding which groups of words constitute a valid place name. The only idea I have is to look them up against an authoritative database of parishes (or the parishes from an earlier census, though I know they tend to change).
@richardofsussex it is important that we get that full list of names even though as you say they be things other than what is rigorously termed a civil parish. The census document for the piece is separated by those units. I need to be able to validate what the transcriber has entered against those units. Hence we must keep what in there unadulterated. There is a need to be able within the application to group and systematize these units into meaningful (to the researcher) groups but that is a different story and something @PatReynolds has been trying to get her head around not just for CEN but for all our apps
OK, I'll see what I can do. Presumably 1901 and 1911 are higher priority than the other years, in that we (mostly) don't have any PARMS files at all for them?
I think so; SOM 1901 is the only one that I know we have but from whence it came is unknown to me
Could you please send it to me, so I can use it as a control/reality check? mailto:richardlight399@gmail.com
Thanks. TNA have the first six pieces as Williton: http://discovery.nationalarchives.gov.uk/browse/r/h/C149922, The PARMS file has them - essentially - with their Sub-Districts as the 'a' record. Which is correct?
I will try to simplify it from my perspective . Not sure if this will help or not.
Background first. A single Census Piece is made up of a set of ED’s. At this precise time that could be from a small Piece of say 3 EDs up to the biggest I have so far is for Devon 1851 census of 58 Ed’s. That can be potential mean 58 Civil Parish Places
Each ED has a number and a Civil Parish PLACES name that is on the top left hand corner of every census image page. You can get more than one ED for the same Civil Parish Place in any one Piece. These Civil Parish Places are what Richard was mentioning and we use them on a transcription in column A. FC1
The other Places that he is not sure about are the ‘Hamlets’ They are still Places that people in the Census say they were born. So we do get these entered in the POB column, but not against a Civil Parish Place name.
When the PARMS were done in the first instance in 1998 we only enter the Civil Parish name as the ‘b’ number as they not only match those in the NA for the PIECE number but match the Civil Parish Place name that the transcriber picks up from the ED and the Census Image. We also use the Registration name as the ‘a’ number.
Summary.
Not being a Programmer I would not know where to start. I only know what we have and what we enter in each Piece we transcribe. I have attached an example for one PIECE number re Places Devon 1851 HO107/1891 or as we will upload it as HO511891
1, National Archives list of Places
Plenty of places to pick up Places!!!! Which one will we go for?
I am sorry I do not have the answer. Pat has asked that I look at something re Places. Time not an option this week. I will try to look at it next week.
You are welcome to ‘bin this’ if not relevant to the query.
Brenda
To add to that in 1911 there is a one to one relationship between the piece number and the ED number.
Geoff
Richard I did the Somerset 1901 PARMS manually from the NA records. I copied and pasted from each piece.
I was taught when I started in 2004 that the sub-district went into the PARMS so that is what I have always done. So all the Somerset pieces will be the same.
That suggests that I either misunderstood Terry or all his are the same as mine. Was I wrong? Geoff
Thanks, Geoff, that's a useful insight. We'll need to agree what the 'rules of engagement' are before I can go ahead and try to churn out PARMS for everything. Sub-Districts appear to have been around since 1851. Are 'hundreds' the equivalent concept for 1841?
As a random test, I searched for Lights in Somerset in 1891. The hit I looked at (https://www.freecen.org.uk/search_records/5db72b6bf4040b9e96f3f1aa/ada-m-light-1891-somerset-bedminster-1870-?locale=en) was for Place Bedminster, Parish Dundry. Your 1901 PARMS file has Dundry within Long Ashton, not Bedminster; my PARMS file doesn't have Dundry at all.
Richard et al
As I said previously I was advised to use the sub-Districts for my PARMS when I first started in 2004. For my first couple I hunted through the images to find the first image in each ED and transcribed the PARMS form there.
I have continued to use the sub-Districts since. Nobody ever said that what I did was incorrect.
However, I have been thinking about the big picture.
It is intended, I understand, to link the searches of FreeBMD, FreeREG and FreeCEN at some time in the future so that when a person is being researched all 3 programmes contribute to the report. FreeUKGEN will need some way of linking the searches. It strikes me that the Registration District may be a key to limiting the searches and therefore the rules in each project should be the same.
The rules for FreeBMD can be found at https://www.freebmd.org.uk/DistrictInfo.html. They use the Registration District and Civil Parish.
So it makes sense to me that FreeCEN should use the same Registration District and Civil Parish in their PARMS. That gives both projects the same data filters.
If I need to amend the Somerset PARMS or FC2 I will do so.
We also have the issue of entries such as Bitton (GLS) in the PARMS (Bitton is out of County SOM) and the National Archives having terms such as (part) in their Civil Parishes. These will create errors in the new FreeCEN front end (CSVProc) test reports as they do not align with the Civil Parish in the censuses. The extraneous information will need to be edited out, either on creation of the PARMS or when the piece is uploaded to CSVProc.
Those are my thoughts.
Also the use of District or Sub-District is important to me as I am currently writing the 1911 PARMS for Somerset so that we can test 1911 pieces in CSVProc. I know it is a redundant column at the moment but I would like to be testing with the right file and not have to amend it later. It is going to be huge!
Geoff
Richard
FreeBMD already has a comprehensive database. Is that in fact the National Archives data?
Geoff
Richard
Somerset is the only county working on 1911 and that is currently just for testing purposes.
Somerset is transcribing 1901 but cant process the transcriptions on the old system so waiting for the new one. Nottingham, I understand, is not far away.
However I can look after Somerset. I will just need to amend my existing 1901 file and I am in the process of creating 1911.
I would put a higher priority on the missing PARMS in the 19th Century.
I believe that is what Brenda needs.
Geoff
Richard
The Hundreds appear to be the structure in 1841. It to me is the same as the Registration District.
Looking at lower levels one of my volunteers offered up this view about proposed fields in the new transcription spreadsheets. I am not sure if it is helpful in understanding the issue or not.
I’ve just looked up the Vision of Britain site in relation to census analysis to try and understand the geographical divisions. Correct me if I’m wrong but it looks to me as though, in 1901, England and Wales were divided into 1122 Urban Districts (including Municipal Boroughs and County Boroughs) and 664 Rural Districts. The Urban Districts could be sub-divided into Wards and the Rural Districts into towns, villages and hamlets (or even, as marked on some of the Pitminster pages, ’scattered houses’).
So I suggest using 3 columns:
1 District identifier - U for urban or R for rural
2 District/Borough name - text
3 Ward/Town/Village/Hamlet name - text
What do you think?
I am still kicking that one around in my head!
Geoff
Geoff,
I'm actually quite keen to have a go at 1911. The fact that each record has a simpler structure means that I can probably do a better job of auto-generating PARMS files. I'll see what I can do for Somerset, and will await your comments with interest!
Richard
This is what I have so far
Geoff
Geoff,
And this is what I have generated from TNA data. I'm sure there are errors/inconsistencies, but there is nothing that jumps out at me as being obviously wrong. (The error I'm expecting is more than one place name lumped together in a single 'b' cell.) What do you think of it? Vino: will it load?
Richard
SOM1911.zip Hi, I have now written a checking procedure which uses the 1891 place names (as I have them) as an authority. It identifies each level-b place which matches a place in that file (doing a longest-first match) and creates separate entries in the spreadsheet for each place. Where it doesn't match an entry in the 1891 file, it puts question marks round it. There are some 'funnies' in the result, but as far as I can tell these are all caused by funnies in the 1891 'authority'. For example, 'Minehead Without' doesn't appear in that file, but 'Without' on its own does, so we end up with two entries 'Minehead' and 'Without'. By improving the authority file (or using a better one) we can get better results. My thinking is that we can generate a CSV file in this format for 1911 for each county. All the county coordinator would have to do is to go through the CSV looking for question marks, and update the parish names to their satisfaction. Does this look like an approach that will work? Richard
Having looked at the commentary that has taken place while I was in the land of Nod. I have to say I am now totally and utterly confused. Our terminology wanders all over the place; as does that of the TNA. We started off talking about Registration Districts which it now appears we do not record in a piece. There are Registration Sub Districts which we record (usually?) as the a in our parms files that we appear to call District in CEN2 and Place in Cen1. How this field could be called either District or Place is beyond me. Then we record things that could be just about anything as b in our parms files. Cen1 dodges what they are are lists them under a heading of Comprising. Cen2 calls them Subplaces. Our csv has an empty field after the piece number; I wonder what that was for? Utter confusion. I hope @richardofsussex can make sense of this because I cannot
@richardofsussex Your extract of 1911 should not be uploaded directly; it would benefit from a manual edit as some subplaces are clearly combined because you have no ability to detect the separation. But it would not be a large task. Well done; Would love to see one for 1901
Kirk, Please see my latest contribution; while it definitely still needs manual intervention, this is now purely to check queries. I've spent the best part of the day implementing a 'look-up' which appears to be reasonably successful in splitting apart the subplaces which TNA have cheerfully clumped together for us from 1901 onwards. Turning to your previous point, I think the key point is to consider the purpose of these PARMS files, and to ensure that the data we submit as PARMS is fit for that purpose. As I understand it, the purpose of the 'b' places is to provide an authority against which data in individual records is validated. I'm not sure of the purpose of the 'a' places, or whether it particularly matters whether they are Districts or Sub-Districts. I would welcome @FreecenBren 's advice on these points. In converting directly from TNA records, we are clearly constrained by their recording practice, which has changed over the period 1841-1911.
@richardofsussex unfortunately programmers as you know need precision and definition; without it chaos reigns supreme.
For both @geoffj-FUG and @FreecenBren the purpose of this new set of extracts is for CEN2 ingestion not CEN1. So that fields reflect the TNA without a 20 character constraint
The Purpose of the b field for checking is one specified by me in processing a csv data file. The b field may well have other uses; in CEN1 you can search by it.
The a field is used as the search district
I would like to know if we can also record the Registration District in the parms in the second field as that does not appear to be used for anything????????????
Richard
I have compared the start of the 2 files (yours and mine).
The downloaded version has pieces missing. Look at 14121-14122, 14125, 14137-40.
Otherwise they look the same.
Does the fact that the program creates multiple ‘a’ rows for the same piece cause upload problems?
Geoff
From: Richard Light richardlight399@gmail.com Sent: Friday, May 1, 2020 10:06 PM To: geoff.jarvis@freeukgenealogy.org.uk; 'FreeUKGen/FreeCENMigration' reply@reply.github.com; 'FreeUKGen/FreeCENMigration' FreeCENMigration@noreply.github.com Cc: 'Mention' mention@noreply.github.com Subject: Re: [FreeUKGen/FreeCENMigration] Ability to load a parms file directly into RC2 (#833)
Geoff,
And this is what I have generated from TNA data. I'm sure there are errors/inconsistencies, but there is nothing that jumps out at me as being obviously wrong. (The error I'm expecting is more than one place name lumped together in a single 'b' cell.) What do you think of it? Vino: will it load?
Richard
On 01/05/2020 12:34, geoff.jarvis@freeukgenealogy.org.uk mailto:geoff.jarvis@freeukgenealogy.org.uk wrote:
Richard
This is what I have so far
Geoff
From: Richard Light mailto:notifications@github.com notifications@github.com Sent: Friday, May 1, 2020 9:22 PM To: FreeUKGen/FreeCENMigration mailto:FreeCENMigration@noreply.github.com FreeCENMigration@noreply.github.com Cc: geoffj-FUG mailto:geoff.jarvis@freeukgenealogy.org.uk geoff.jarvis@freeukgenealogy.org.uk; Mention mailto:mention@noreply.github.com mention@noreply.github.com Subject: Re: [FreeUKGen/FreeCENMigration] Ability to load a parms file directly into RC2 (#833)
Geoff,
I'm actually quite keen to have a go at 1911. The fact that each record has a simpler structure means that I can probably do a better job of auto-generating PARMS files. I'll see what I can do for Somerset, and will await your comments with interest!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FreeUKGen/FreeCENMigration/issues/833#issuecomment-622350368 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AKCPIFKWICYDLFKF4Z3F3ILRPKWEDANCNFSM4LUK4O7Q . https://github.com/notifications/beacon/AKCPIFLJ3D3C2KKPK3KSX3TRPKWEDA5CNFSM4LUK4O72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEUMFAIA.gif
--
Richard Light richardlight399@gmail.com mailto:richardlight399@gmail.com @richardofsussex
Kirk et al
I appears that we are trying to fit a square peg in a round hole.
So …
From a programming need what should we be recording to meet our needs into the future, including future reporting?
That is the position we have taken on the new fields descriptions. Why don’t we consider this issue from scratch?
I will start:
We need the Civil Parishes – they match up with the field in the transcriptions.
We need at least ‘a’ and ‘b’ codes or columns or a different separator. They define the data in the spreadsheet.
If we look at it from this perspective we will have what we need in the end.
Geoff
From: Kirk Dawson notifications@github.com Sent: Saturday, May 2, 2020 2:21 AM To: FreeUKGen/FreeCENMigration FreeCENMigration@noreply.github.com Cc: geoffj-FUG geoff.jarvis@freeukgenealogy.org.uk; Mention mention@noreply.github.com Subject: Re: [FreeUKGen/FreeCENMigration] Ability to load a parms file directly into RC2 (#833)
@richardofsussex https://github.com/richardofsussex Your extract of 1911 should not be uploaded directly; it would benefit from a manual edit as some subplaces are clearly combined because you have no ability to detect the separation. But it would not be a large task. Well done; Would love to see one for 1901
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FreeUKGen/FreeCENMigration/issues/833#issuecomment-622455352 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AKCPIFKVJ4XAJVX2QFTH4Q3RPLZFVANCNFSM4LUK4O7Q . https://github.com/notifications/beacon/AKCPIFOD7JG4JYYJRPQ5DRDRPLZFVA5CNFSM4LUK4O72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEUM6UOA.gif
Kirks suggestion makes very good sense to me.
It will give us a natural hierarchy:
County
Registration District
Sub Registration District
Civil Parish
That looks like a very good hierarchy to have to me. It should meet a lot of possible needs.
Geoff
From: Kirk Dawson notifications@github.com Sent: Saturday, May 2, 2020 3:27 AM To: FreeUKGen/FreeCENMigration FreeCENMigration@noreply.github.com Cc: geoffj-FUG geoff.jarvis@freeukgenealogy.org.uk; Mention mention@noreply.github.com Subject: Re: [FreeUKGen/FreeCENMigration] Ability to load a parms file directly into RC2 (#833)
@richardofsussex https://github.com/richardofsussex unfortunately programmers as you know need precision and definition; without it chaos reigns supreme.
For both @geoffj-FUG https://github.com/geoffj-FUG and @FreecenBren https://github.com/FreecenBren the purpose of this new set of extracts is for CEN2 ingestion not CEN1. So that fields reflect the TNA without a 20 character constraint
The Purpose of the b field for checking is one specified by me in processing a csv data file. The b field may well have other uses; in CEN1 you can search by it.
The a field is used as the search district
I would like to know if we can also record the Registration District in the parms in the second field as that does not appear to be used for anything????????????
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FreeUKGen/FreeCENMigration/issues/833#issuecomment-622481755 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AKCPIFOFHRMIQCZ6OZ3P5C3RPMA77ANCNFSM4LUK4O7Q . https://github.com/notifications/beacon/AKCPIFKIMOFSCHGTVJD5ELTRPMA77A5CNFSM4LUK4O72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEUNFCWY.gif
I would love to see a 1901 for Notts as we are staring it shortly.
Brenda
OK, this is the 1901 for Notts, as generated by my system. I notice that it has far fewer 'b' entries than the 1881 one, but haven't had time to check why this might be. Anyway, I would welcome your reaction to it. (The names with question marks are those which didn't match a list of places which I generated from the 1891 data.) NTT1901.zip
@richardofsussex does your comment refer to what we have on the website or what you have downloaded anew
@richardofsussex Having spent some time digging I think it is time for CEN2 break from the specification used in CEN1. Hence I would like to specify a new format for CEN2 csv parms that collects all the information that is needed for CEN2 to have pieces that it can effectively link to a variety of different locations. This will contain the fields needed by CEN1 so we can create those from CEN2.
Please give me a week to develop the spec if that is acceptable; I don't think it will have a major impact on your current coding. We could perhaps think of moving to a json formatted file.
Ah: I'm talking about the file which I generated myself when Brenda asked me to produce some PARMS files for 1881. I'm not sure what relation it will have to what finally ended up on the web site.
Currently Parms files are loaded into FC2 from FC1 after FC1 has been updated. They are incorporated into FC2 through the overall FC2 monthly update. We need to be able to a)load a parms file directly into FC2 and b) this needs to be independent of the overall FC2 update. This is an urgent requirement for loading 1901 and 1911 Params files and to allow for checking of CVS uploads of 1901 and 1911 records.