DemocracyClub / yournextrepresentative

👥 A website for crowd-sourcing structured election candidate data
https://candidates.democracyclub.org.uk
GNU Affero General Public License v3.0
21 stars 27 forks source link

Import SOPNs by fetching a URL #1539

Open symroe opened 3 years ago

symroe commented 3 years ago

There are two use cases for the feature:

  1. When uploading SOPNs in the frontend, users currently have to download a PDF from the source website, upload it to YNR and then add the source URL to the input. Just adding a URL to a PDF would be much easier, although we need to consider if we want to "source" to be the URL that linked to the PDF still
  2. If we want to upload historical SOPNs we would have to enter them in the frontend for upload to work properly. A management command that takes a CSV with a ballot ID and URL would make this much easier.

To solve both of these, we should make sure that the SOPN upload functions aren't tied to the view code, and add a feature to fetch a PDF by URL

VirginiaDooley commented 2 years ago

A third use case is when a council publishes SOPN info as a webpage rather than as a pdf. Relying on users to convert pdf can lead to parsing failures as was the case here.

We could accept the sopn url, check it's not linked to a pdf, and convert html sopns to pdf before parsing using https://pypi.org/project/pdfkit/

VirginiaDooley commented 1 year ago

Bury posted an HTML SOPN in a non-standard format (https://www.bury.gov.uk/council-and-democracy/elections-and-voting/statement-of-persons-nominated). We should discuss whether this is something the parser should support or whether it would be easier to address this on a case-by-case basis.