Up to this point, this PR only deals with disentangling the parallel elements of data flow to get a system we can test on GitHub actions. The branch name and PR name are both misleading...
CoPilot Summary
This pull request introduces several changes to the pymorize project, focusing on adding support for the Dask parallel processing backend, improving caching mechanisms, and adding new test fixtures and example data. The most important changes are grouped into improvements in parallel processing, caching, and testing.
Parallel Processing Improvements:
Added support for Dask as a parallel processing backend in the CMORizer class, including methods to configure and create a Dask cluster if specified in the configuration (src/pymorize/cmorizer.py). [1][2][3][4]
Modified the _run_prefect method in Pipeline to use a local Dask cluster if no cluster is assigned (src/pymorize/pipeline.py).
Caching Improvements:
Introduced a new cache_policy parameter in the Pipeline class to handle cache expiration and policies more flexibly, and adjusted the _prefectize_steps method to use these new cache settings (src/pymorize/pipeline.py). [1][2][3]
Testing Enhancements:
Added a new test configuration file test_config_pi_uxarray.yaml and corresponding fixtures to support testing with the UXArray test data (tests/configs/test_config_pi_uxarray.yaml, tests/fixtures/config_files.py, tests/fixtures/example_data/pi_uxarray.py). [1][2][3]
Created an integration test test_uxarray_pi.py to validate the new configuration and data fixtures (tests/integration/test_uxarray_pi.py).
Miscellaneous:
Added netcdf4 to the list of dependencies in setup.py to support reading NetCDF files (setup.py).
Minor code improvements and bug fixes, including fixing a typo in the assign_cluster method and adding a check for the compute method in trigger_compute (src/pymorize/pipeline.py, src/pymorize/generic.py). [1][2]
Up to this point, this PR only deals with disentangling the parallel elements of data flow to get a system we can test on GitHub actions. The branch name and PR name are both misleading...
CoPilot Summary
This pull request introduces several changes to the
pymorize
project, focusing on adding support for the Dask parallel processing backend, improving caching mechanisms, and adding new test fixtures and example data. The most important changes are grouped into improvements in parallel processing, caching, and testing.Parallel Processing Improvements:
CMORizer
class, including methods to configure and create a Dask cluster if specified in the configuration (src/pymorize/cmorizer.py
). [1] [2] [3] [4]_run_prefect
method inPipeline
to use a local Dask cluster if no cluster is assigned (src/pymorize/pipeline.py
).Caching Improvements:
cache_policy
parameter in thePipeline
class to handle cache expiration and policies more flexibly, and adjusted the_prefectize_steps
method to use these new cache settings (src/pymorize/pipeline.py
). [1] [2] [3]Testing Enhancements:
test_config_pi_uxarray.yaml
and corresponding fixtures to support testing with the UXArray test data (tests/configs/test_config_pi_uxarray.yaml
,tests/fixtures/config_files.py
,tests/fixtures/example_data/pi_uxarray.py
). [1] [2] [3]test_uxarray_pi.py
to validate the new configuration and data fixtures (tests/integration/test_uxarray_pi.py
).Miscellaneous:
netcdf4
to the list of dependencies insetup.py
to support reading NetCDF files (setup.py
).assign_cluster
method and adding a check for thecompute
method intrigger_compute
(src/pymorize/pipeline.py
,src/pymorize/generic.py
). [1] [2]