This implements a htv export command that exports the database to CSV tables. All relevant database tables are exported as (mostly) normalized CSV tables (e.g. members.csv, votes.csv, member_votes.csv).
The command also creates a metadata file that describes the table schema following the CSV on the Web standard and a markdown Readme that describes the table schema (example). The implementation is similar to the implementation of the OpenAPI specs: Rows of each table are modeled using a Python TypedDict, for example:
class MemberRow(TypedDict):
"""Each row represents a Member of the European Parliament (MEP)."""
id: int
"""Member ID as used by the [MEP Directory](https://www.europarl.europa.eu/meps/en/home)."""
first_name: str
"""First name"""
last_name: str
"""Last name"""
country_code: str
"""3-letter ISO-3166-1 code"""
# ...
The type annotations and docstrings are used to auto-generate metadata and documentation.
Publishing
The htv export command creates multiple CSV and metadata files and stores them on the local file system. I’ve configured Caddy to serve these files at https://howtheyvote.eu/export.
My idea was to then create a separate public GitHub repository with a scheduled Actions workflow that downloads the files and commits them to the repository or publishes them as a release.
This way, we don’t have to (manually) handle credentials for downloading the export or committing it to the repository.
Data coverage
The export is currently still missing some data:
Information about relevant geographic areas and EuroVoc concepts for votes. Should be easy enough to add that in a second step, but I wanted to focus on the most frequently requested data first to keep the scope of the PR small.
Press releases
Information about sources/fragments. Not sure there’s a use case for that, I’d suggest not to implement that for now.
Todos
[ ] Double-check documentation for tables/columns. I think a little more detail would be helpful in some cases, for example for is_main etc.
[x] Generate export in regular interval (e.g. weekly) and make it publicly available. We could then easily set up a GitHub Action to pull and commit the export in a public repository (similar to what we used to do before).
[x] Implement Prometheus metric for successful export generations (so we can set up an alert if it fails).
This implements a
htv export
command that exports the database to CSV tables. All relevant database tables are exported as (mostly) normalized CSV tables (e.g.members.csv
,votes.csv
,member_votes.csv
).The command also creates a metadata file that describes the table schema following the CSV on the Web standard and a markdown Readme that describes the table schema (example). The implementation is similar to the implementation of the OpenAPI specs: Rows of each table are modeled using a Python TypedDict, for example:
The type annotations and docstrings are used to auto-generate metadata and documentation.
Publishing
The
htv export
command creates multiple CSV and metadata files and stores them on the local file system. I’ve configured Caddy to serve these files athttps://howtheyvote.eu/export
.My idea was to then create a separate public GitHub repository with a scheduled Actions workflow that downloads the files and commits them to the repository or publishes them as a release.
This way, we don’t have to (manually) handle credentials for downloading the export or committing it to the repository.
Data coverage
The export is currently still missing some data:
Todos
is_main
etc.