Open pgierz opened 4 days ago
Alright, it's at least doing something:
#!/bin/bash -l
#SBATCH --account=ab0246
#SBATCH --partition=compute
#SBATCH --nodes=1
#SBATCH --time=00:30:00
# export PREFECT_SERVER_ALLOW_EPHEMERAL_MODE=False
export PREFECT_SERVER_API_HOST=0.0.0.0
export PREFECT_LOCAL_STORAGE_PATH="./task-cache/"
conda activate pymorize
prefect server start &
time pymorize process sample.yaml
When running, you get the following:
$ ls -alh ./task-cache
total 5.0G
drwxr-sr-x 2 a270077 ab0246 4.0K Oct 9 10:03 .
drwxr-sr-x 5 a270077 ab0246 84K Oct 9 10:02 ..
-rw-r--r-- 1 a270077 ab0246 327 Oct 9 10:03 1d33fdca8b1fb0b2f599d4a5f5baeabb
-rw-r--r-- 1 a270077 ab0246 15M Oct 9 10:05 2772efda3019a1630f1da30c53480b1c
-rw-r--r-- 1 a270077 ab0246 9.4M Oct 9 10:05 33f19ada24e79b8ba27a4507b20c903d
-rw-r--r-- 1 a270077 ab0246 2.5G Oct 9 10:03 6c66febdf523c0a30897ed4ee57edf7d
-rw-r--r-- 1 a270077 ab0246 2.5M Oct 9 10:05 9a5ff13448c0e86425d82e0855557725
-rw-r--r-- 1 a270077 ab0246 2.5G Oct 9 10:03 bf43304fee052e08aed9ef0402fe2791
-rw-r--r-- 1 a270077 ab0246 6.4K Oct 9 09:57 da6c58da3aa841659f1aced8e7dbc535
-rw-r--r-- 1 a270077 ab0246 2.5M Oct 9 10:05 e96bf03d4145ed875c25824a29cd8cc7
If you examine one of these cache files more closely:
$ jq . ./task-cache/1d33fdca8b1fb0b2f599d4a5f5baeabb
{
"metadata": {
"storage_key": "/work/ab0246/a270077/SciComp/Projects/pymorize/examples/task-cache/1d33fdca8b1fb0b2f599d4a5f5baeabb",
"expiration": "2024-10-10T08:02:48.074715Z",
"serializer": {
"type": "pickle",
"picklelib": "cloudpickle",
"picklelib_version": null
},
"prefect_version": "3.0.1",
"storage_block_id": null
},
"result": "gAVOLg==\n"
}
Ideally, we would build a CLI part that can interface with this cache, and maybe print out the Rule
and Pipeline
associated with the cache. I'll see if that can be built...
The new filecache.py
isn't passing the Flake8 test. @siligam, I guess you have a missing name somewhere which I fixed, but please look again to make sure your code still does what you think it should. I cleaned up a bit, please pull before you write docstrings.
According to the dashboard, it also seems to be doing some kind of caching. Note the green "Cached" tag on the right pop-out display.
Resolves #42
TODO:
[ ] Figure out how to get
cache_key_fn
nicely[x]
cache
CLI subcommand[ ] Cache Expiration -- should be configurable from the user side?
Copilot Summay
This pull request introduces significant changes to the caching system, updates to the CLI commands, and improvements to the pipeline configuration. The most important changes include the addition of new caching functions, CLI commands for inspecting the cache, and enhancements to the pipeline's task caching mechanism.
Caching System Enhancements:
src/pymorize/caching.py
. These functions allow for manual insertion of checkpoints and inspection of cache contents, supporting both JSON and pickle formats.Filecache
class insrc/pymorize/filecache.py
for managing file-based caching, including methods for adding and updating file records, and computing file statistics.CLI Updates:
src/pymorize/cli.py
to include new CLI commands for inspecting Prefect's storage cache and specific cached results. These commands utilize the new caching functions. [1] [2] [3] [4]Pipeline Improvements:
src/pymorize/pipeline.py
by adding a manual checkpoint function to the default pipeline steps and implementing task caching with expiration and policy settings. [1] [2] [3]These changes collectively improve the caching capabilities, provide more robust CLI tools for cache management, and optimize the pipeline's task execution.