DemocracyClub / yournextrepresentative

👥 A website for crowd-sourcing structured election candidate data
https://candidates.democracyclub.org.uk
GNU Affero General Public License v3.0
21 stars 27 forks source link

SOPN Parsing: Table Parsing Errors #1728

Open VirginiaDooley opened 2 years ago

VirginiaDooley commented 2 years ago

This issue is exclusively to track issues with SOPN Table Parsing. For SOPN Parsing: Table Extraction Errors, go here: https://github.com/DemocracyClub/yournextrepresentative/issues/1727 For SOPN Parsing: Page Extraction Errors, go here: https://github.com/DemocracyClub/yournextrepresentative/issues/1726

Table parsing errors are typically found after a successful SOPN upload, during a bot parse. Either the bot fails to parse completely (which could be the result of a table extraction failure) and no information is pre-filled in the bulk add form, or there is some pre-filled info but that info might have:

Please add these types of issues in the comments below with a

VirginiaDooley commented 2 years ago

Only the surname column has been parsed https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-815955055

VirginiaDooley commented 2 years ago

Header cannot be found https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-1025686196

VirginiaDooley commented 2 years ago

Ward matching error https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-1025694482

VirginiaDooley commented 2 years ago

Blank line assumes Independent Candidate https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-1025701260

VirginiaDooley commented 2 years ago

No ParsedSOPN https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-1025832909

VirginiaDooley commented 2 years ago

Failed to downcase entire surname https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-1025842066

VirginiaDooley commented 2 years ago

Misplaced middle initial https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-1025918766

VirginiaDooley commented 2 years ago

Name parsing error (spacing and order) https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-1025960148

VirginiaDooley commented 2 years ago

Name order error on just one candidate https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-1025987152

michaeljcollinsuk commented 2 years ago

Some names parsed as SURNAMEFirstname https://github.com/DemocracyClub/yournextrepresentative/issues/1426#issuecomment-816098146

VirginiaDooley commented 2 years ago

Just locked the Dunfermline South one, although I notice this is another one with double barrelled names causing an issue - this time as it appears the name isn’t hyphenated so is being treated as a middle name rather than a surname. https://candidates.democracyclub.org.uk/elections/local.fife.dunfermline-south.2022-05-05/ https://candidates.democracyclub.org.uk/person/85877/lynn-ballantyne-wardlaw

samsmith commented 2 years ago

the first letter of surnames ended up as an initial in their name https://candidates.democracyclub.org.uk/elections/local.cheltenham.all-saints.2022-05-05/sopn/

sjorford commented 2 years ago

"INDEPENDENT" parsed as "Independent Dickens Heath Residents Action Group"

image

https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.city-of-london-alder.bridge.2022-07-07/

it3986 commented 1 year ago

Names parsed as Surname First Names

https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.east-riding-of-yorkshire.hessle.2023-05-04/

Same happened for Howdenshire Ward of East Riding of Yorkshire Council https://candidates.democracyclub.org.uk/elections/local.east-riding-of-yorkshire.howdenshire.2023-05-04/

And again for East Wolds & Coastal Ward https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.east-riding-of-yorkshire.east-wolds-and-coastal.2023-05-04/

image
sjorford commented 1 year ago

"INDEPENDENT" parsed as "Independent Dickens Heath Residents Action Group"

image

https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.city-of-london-alder.bridge.2022-07-07/

Exactly the same issue here: image

https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.tendring.cann-hall.2023-05-04/

sjorford commented 1 year ago

For all(?) the Rushmoor SOPNs, the first letter of each surname has been split off:

https://candidates.democracyclub.org.uk/elections/local.rushmoor.aldershot-park.2023-05-04/sopn/

image

image