edanalytics / edu_edfi_airflow

Manages extract-load of Ed-Fi data in Airflow
Other
4 stars 0 forks source link

Feature: partitioning pre-processing function #47

Closed johncmerfeld closed 2 months ago

johncmerfeld commented 3 months ago

Pertains to this ticket

Add a static method to the EarthbeamDAG class that takes in one or more CSVs and saves them as parquet files partitioned by tenant_code and api_year (or equivalent columns)

Testing

The easiest way to test this code is by running the included unit tests. Since there was no existing test framework attached to this repo, I took the liberty of initializing some Pytest configuration. In order to run the tests, do the following:

The test file serves as documentation for how the new function can be used.