Closed bendnorman closed 2 years ago
Running etl_fast.yml
using the DaskExecutor produced a couple of
[0m FileNotFoundError: [Errno 2] No such file or directory: '/pudl/outputs/cache/2021-09-14-2222-a1c3c13c-fd40-4aa3-8814-c25d0fcf88e6/dataframes/0cdfb9fa15ac11eca3f90242ac120003/boiler_fuel_eia923'
errors for different tables. I checked the prefect flow chart and the tasks seem to be executed in the correct order which makes me think this is an issue when caching with multiple processes.
Luckily running the fast etl using the LocalDaskExecutor
works! This executor works on a single node and should provide a speed up so it is good enough for this first iteration of cloudification. We will need to use the DaskExecutor
if we want multi-node execution.
I was able to easily run the ETL using the
LocalExecutor
.docker-compose up
creates two dask workers and runs the ETL using theDaskExecutor
. This ran but got hung up on various parts of the ETL and produced some dask warnings about tasks holding onto process locks for too long.