BiologicalRecordsCentre / UKBMS-online

Issue tracking for UKBMS online recording site
2 stars 0 forks source link

Historic data upload - error with section 10 data #229

Open IanMiddlebrook opened 3 years ago

IanMiddlebrook commented 3 years ago

Hi @DavidRoy

A county recorder has been ploughing through the historic data uploaded on the UKBMS website.

He noticed an anomaly, that there is no section 10 data for several sites. Comparing the upload to data already he held, he's established that the Section 10 data have been uploaded as additional Section 1 records.

Examples I've found - 2006 data for sites: Scar Close NNR (3415), Swarth Moor SSSI (3405), Threshfield, Long Ashes (3410) - all have data for section 1-9 and 11 onwards.

But this doesn't seem to be consistent across the board. Eg. I checked some Dorset data for 2002. There were 14 sites with 11 or more sections, 8 of these had no data for section 10.

Not sure how we can check or correct this, given the number of years and potential sites involved. But may cause some issues.

Thanks, Ian

Gary-van-Breda commented 2 years ago

@IanMiddlebrook or @DavidRoy : Can you attach an example of the data file used to upload the historic data that had the section 10 put into section 1?

DavidRoy commented 2 years ago

@Gary-van-Breda @IanMiddlebrook I've tracked down the issue I think. The import process started with matching the Indicia site and section IDs with the historic data - so that data was loaded against the right location terms lists. Unfortunately there is a problem with the [Full Site Sections data download (CSV)] that is downloaded from https://ukbms.org/all-sites. Several Sections have the name in the format of 'Site - S1', however several have names based on code numbers (e.g. 3220.1). For those sections that have codes, the section name does not match the section number. All those where section number is S10 have .1 as the code. This explains why the occurrences for Section 10 were loaded into section 1 when the historic data was uploaded.

an example for one site, section 10 <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Site ID | Site Name | Site Code | Site Type | SiteRef | System | Section ID | Section Name | Section Code | SectionRef | Unique Section Code -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 3363 | Threshfield, Long Ashes | 3410 | Transect | SD977647 | OSGB | 9302 | 3410.1 | S10 | SD978646 | 3410.1

@Gary-van-Breda - is there a bug in the website download report? What is the process for section names changing from the code format to the name format? When the site is edited, and/or the transect lines added?

DavidRoy commented 2 years ago

@Gary-van-Breda I suggest we return to this issue once other ones are resolved. It will take some thought as to how best to apply a general fix. I think we can do it based on the occurrence.external-key as we can match back to the original data that was loaded