Closed davidread closed 8 years ago
I've also seen this on another organogram:
BBSRC, September 2011 Junior roles for "DIRECTOR OF THE INSTITUTE OF FOOD RESEARCH (UEA EMPLOYEE)"
I think that they are separated issues, I've identified and fixed Biotechnology and Biological Sciences Research Council. This was down to bad generation of junior role 'id' in the hierarchy. 'id': post['Reporting Senior Post'] + "" + post['Grade'], so junior roles with the same grade reporting to the same senior got overridden. I've changed it to include job title: 'id': post['Reporting Senior Post'] + "" + post['Grade'] + "_" + post['Generic Job Title'],
I believe that ACAS issue is caused by etl script, the number of junior roles visible here: https://test.data.gov.uk/organogram/advisory-conciliation-and-arbitration-service/2011-09-30 is coming from csv generated by the script: http://test.data.gov.uk/organogram-ajax/preview/data/advisory_conciliation_and_arbitration_service-2011-09-30-organogram-junior.csv
This is an old csv file because the XLS file used to generate it: co-prod2.dh.bytemark.co.uk:/home/co/organogram-data/data/dgu/xls-from-triplestore/advisory_conciliation_and_arbitration_service-2011-09-30-organogram.xls now fails validation and couldn't be uploaded any more so we can't be sure if we still get this truncated csv out of the script.
I'm not convinced that generating an id from the row data is going to always work. For example, that same ACAS Excel sheet has all these that only vary by Unit:
I imagine it is also valid to have identical rows.
Can you simply assign an ID using the row number?
ACAS is now showing 76 junior roles, which will increase when this is fixed.
I believe the XLS does indeed validate:
(ckan)co@precise64:/src/organograms$ ./etl_to_csv.py data/dgu/xls-from-triplestore/advisory_conciliation_and_arbitration_service-2011-09-30-organogram.xls /tmp/ --date-from-filename
Loading data/dgu/xls-from-triplestore/advisory_conciliation_and_arbitration_service-2011-09-30-organogram.xls
Writing /tmp/advisory_conciliation_and_arbitration_service-2011-09-30-organogram-senior.csv
Writing /tmp/advisory_conciliation_and_arbitration_service-2011-09-30-organogram-junior.csv
Writing index file: /tmp/index.json
Done.
(ckan)co@precise64:/src/organograms$ grep '"2","Grade' /tmp/advisory_conciliation_and_arbitration_service-2011-09-30-organogram-junior.csv |wc -l
192
Well done, looks like this was it. Fixed on prod2 by using row number as id. I think that it works like it should, could you please take a look?
Yes, all 192 are there now :)
Only a fraction of the junior roles are shown in the organogram view - many are missed out. I'm looking at ACAS September 2011 organogram, clicking on "Director of Delivery".
The old and new systems both have the number "693.09" (number of junior roles, FTE). This matches the spreadsheet - if I filter for the rows reporting to Senior Post 2, the sum of the "Number of Posts in FTE" column is 693.09. And the number of rows is 192, so I'd expect that number of boxes in the organogram. The old system's viz shows roughly that number, but the new system's viz shows only 11.
Old: http://reference.data.gov.uk/gov-structure/organogram/?pubbod=advisory-conciliation-and-arbitration-service&version=2011-09-30
New:
https://test.data.gov.uk/organogram/advisory-conciliation-and-arbitration-service/2011-09-30