geobeyond / Arpav-PPCV-backend

Backend di Piattaforma Proiezioni Climatiche per il Nord-Est.
Creative Commons Attribution 4.0 International
0 stars 1 forks source link

update observations data periodically #196

Closed ricardogsilva closed 3 weeks ago

ricardogsilva commented 1 month ago

This PR implements periodic refresh of stations and observations data.

In order to add these periodic checks, the PR introduces prefect as a dependency and implements the following prefect flows:

These flows leverage prefect's concurrency strategies to run tasks in parallel when possible. This is done by using task_future = task.submit() and task_future.result().

Flows are then configured into relevant deployments by using the prefect.serve() function.

prefect.serve() is very convenient for our use case, as it spawns a server which is able to execute flows locally. This means we can just spin up an additional docker container with the same image as the backend app and use it to perform flow execution. In the current system architecture, this is easily done by adding an additional service to the compose file.

This method, which the prefect docs refer to as static infrastructure, is a simpler alternative to the dynamic prefect way of doing things (i.e. have some separate storage for the flow code, perhaps in minIO, then create a prefect deployment, then have an additional prefect worker that downloads the flow code from this storage) and is a good match for this system. As such, this PR further introduces a new CLI command arpav-ppcv prefect start-periodic-tasks, which just spawns a dedicated prefect worker for processing the aforementioned flows.

The following new services are thus introduced to the docker compose stack:

Periodic schedules

This PR configures flows with the following default schedules:

These can be modified by means of setting the following environment variables:

ARPAV_PPCV__PREFECT__OBSERVATION_STATIONS_REFRESHER_FLOW_CRON_SCHEDULE
ARPAV_PPCV__PREFECT__OBSERVATION_MONTHLY_MEASUREMENTS_REFRESHER_FLOW_CRON_SCHEDULE
ARPAV_PPCV__PREFECT__OBSERVATION_SEASONAL_MEASUREMENTS_REFRESHER_FLOW_CRON_SCHEDULE
ARPAV_PPCV__PREFECT__OBSERVATION_YEARLY_MEASUREMENTS_REFRESHER_FLOW_CRON_SCHEDULE

Authentication

The self-hosted version of prefect, which is what this PR introduces, does not include authentication - both the UI and API are open to whoever accesses them. As such, this PR introduces an additional layer of HTTP Basic Auth, employed at the traefik level. This ensures the prefect components which are exposed to the outside world are guarded with user credentials and coupled with traefik's TLS certs, should provide basic security.

observations harvester CLI command

This PR also replaces the standalone CLI commands that were being used to ingest observation-related data with the new prefect flows, thus always using the exact same code under all circumstances. The CLI did not change, it is just now being powered by the prefect flows. For example:

# this will refresh the stations
arpav-ppcv observations-harvester refresh-stations