Open VirginiaDooley opened 2 years ago
https://candidates.democracyclub.org.uk/elections/local.west-lothian.livingston-south.2022-05-05/sopn/ (and other SOPNs for that election) don't match pages. Chances are this is because the ward names are in the table header.
Hi, it seems the Fife Council one has problems as each table is spread over two pages in the PDF. https://candidates.democracyclub.org.uk/elections/local.fife.burntisland-kinghorn-and-western-kirkcaldy.2022-05-05/sopn/
Wigan strangeness - the correct pages have been used by the parser for all the LA (so far) but the link in the Ashton ward goes to another ward's SoPN https://candidates.democracyclub.org.uk/elections/local.wigan.ashton.2022-05-05/sopn/
It's also joined the Hindley and Hindley Green wards, suggesting it's not strict enough when considering if a ward stretches onto two pages of a SoPN. https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.wigan.hindley.2022-05-05/?edit=1 I wonder if page splitting was offset by one as a result.
...it then processed the Hindley Green page (again) for that ward without issue
Wigan Winstanley ward - it offered the wrong candidate names and linked to the wrong (page of the) SoPN https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.wigan.winstanley.2022-05-05/
https://candidates.democracyclub.org.uk/elections/local.oxford.cowley.2022-05-05/ Should have been Cowley ward, but extracted page was for Littlemore ward. The correct ward is available in the linked https://www.oxford.gov.uk/download/downloads/id/7948/statement_as_to_persons_nominated_-_city_elections_on_5_may_2022.pdf
https://candidates.democracyclub.org.uk/elections/local.wigan.ashton.2022-05-05/ Should have been Ashton ward, but extracted page was for Bryn ward. The correct ward is available in the linked https://www.wigan.gov.uk/Docs/PDF/Council/Voting-and-Elections/2022/Statement-of-Persons-Nominated-for-Local-Election-5-May-2022.pdf
This 4-page single ward PDF incorrectly generated a "Watch out! The original document contains candidate info for 2 areas." warning https://candidates.democracyclub.org.uk/elections/local.tower-hamlets.bethnal-green-west.2022-05-05/sopn/
Same with https://candidates.democracyclub.org.uk/elections/local.tower-hamlets.bethnal-green-east.2022-05-05/sopn/ Both were .docx on their website and initially DC had PDFs with different formatting so I re-did these two, and got the "2 areas" message after uploading each one.
local.lichfield.boney-hay-central.2023-05-04 - the pages for Boney Hay & Central and Bourne Vale wards have been combined
Exeter SOPNs don't appear to have been parsed by the bot - I've looked at the first 3 so far. https://candidates.democracyclub.org.uk/elections/local.exeter.alphington.2023-05-04/
DocX file for Torbay Council doesn't appear to have been understood by the bot. Again I've checked the first 3 wards and they all have the same symptoms. Pages are matched but tables not extracted and no bot suggestions on the bulk adding screen.
[Edit] Later Wards within this SOPN document have not been page matched by the bot and required manual (Ctrl + F) Searching to even find the correct page of the SOPN to manually add the candidates.
Parser fail for Mapperley in Nottingham. Haven't checked the other wards yet but it seems to have picked up the wrong page when parsing.
2023 Examples of the SOPN parser pulling the wrong ward from a combined PDF/DOCX file: https://candidates.democracyclub.org.uk/elections/local.vale-of-white-horse.sutton-courtenay.2023-05-04/sopn/ https://candidates.democracyclub.org.uk/elections/local.redcar-and-cleveland.guisborough.2023-05-04/sopn/ https://candidates.democracyclub.org.uk/elections/local.wyre.garstang.2023-05-04/sopn/ https://candidates.democracyclub.org.uk/elections/local.ribble-valley.ribchester.2023-05-04/sopn/ https://candidates.democracyclub.org.uk/elections/local.bedford.shortstown.2023-05-04/sopn/ https://candidates.democracyclub.org.uk/elections/local.chelmsford.moulsham-lodge.2023-05-04/sopn/ https://candidates.democracyclub.org.uk/elections/local.nottingham.mapperley.2023-05-04/sopn/ https://candidates.democracyclub.org.uk/elections/local.gateshead.2023-05-04/ https://candidates.democracyclub.org.uk/elections/local.bradford.2023-05-04/
Sandwell St. Paul’s is in a limbo half-broken state. The page extraction failed but the table parsing succeeded (albeit in a slightly janky format). The SOPN uploaded is the entire combined PDF file. The suspect for this strange breakage was the backtick in the ward name although Virginia has checked this out and can’t see a problem with it. https://candidates.democracyclub.org.uk/elections/local.sandwell.st-pauls.2023-05-04/sopn/
This issue is exclusively to track issues with SOPN Page Extraction. For SOPN Parsing: Table Parsing Errors, go here: https://github.com/DemocracyClub/yournextrepresentative/issues/1728 For SOPN Parsing: Table Extraction Errors, go here: https://github.com/DemocracyClub/yournextrepresentative/issues/1727
Page extraction errors are typically when trying to upload a SOPN upload. Most common errors include:
Please add these types of issues in the comments below with a