digital-land / technical-documentation

Technical Documentation for the planning data service.
https://digital-land.github.io/technical-documentation/index.html
0 stars 0 forks source link

Spike: Prove Fast API with DuckDB and Parquet on S3 #102

Closed Ben-Hodgkiss closed 4 days ago

Ben-Hodgkiss commented 1 month ago

Overview Following the design proposal for an internal API, we would like to prove some technology choices which include the use of Fast API with DuckDB accessing Parquet on S3.

This work was identified during the spike on API design.

Tech Approach

Acceptance Criteria/Tests


Ticket Management - DELETE this section once completed

cpcundill commented 5 days ago

Code along with README has been pushed to the new pipeline-internal-api repository: https://github.com/digital-land/pipeline-internal-api

cpcundill commented 5 days ago

The work completed in this spike certainly contributed to the implementation required for https://github.com/digital-land/technical-documentation/issues/106. However, it doesn't provide the testing and deployment into AWS which will be required for ticket 106.

cpcundill commented 4 days ago

Reviewed with the team and made one change:

Switched over from explicit path manipulation for dataset and resource to replying upon the automatic hive_partitioning inference built into DuckDB. Dataset and resource are now just WHERE clause parameters, like the other query parameters