esm-tools / pymorize

A Python based Tool to CMORize NetCDF Data
MIT License
0 stars 1 forks source link

Caching #43

Open pgierz opened 4 days ago

pgierz commented 4 days ago

Resolves #42

TODO:

Caching System Enhancements:

CLI Updates:

Pipeline Improvements:

These changes collectively improve the caching capabilities, provide more robust CLI tools for cache management, and optimize the pipeline's task execution.

pgierz commented 4 days ago

Alright, it's at least doing something:

#!/bin/bash -l
#SBATCH --account=ab0246
#SBATCH --partition=compute
#SBATCH --nodes=1
#SBATCH --time=00:30:00
# export PREFECT_SERVER_ALLOW_EPHEMERAL_MODE=False
export PREFECT_SERVER_API_HOST=0.0.0.0
export PREFECT_LOCAL_STORAGE_PATH="./task-cache/"
conda activate pymorize
prefect server start &
time pymorize process sample.yaml

When running, you get the following:

$ ls -alh ./task-cache
total 5.0G
drwxr-sr-x 2 a270077 ab0246 4.0K Oct  9 10:03 .
drwxr-sr-x 5 a270077 ab0246  84K Oct  9 10:02 ..
-rw-r--r-- 1 a270077 ab0246  327 Oct  9 10:03 1d33fdca8b1fb0b2f599d4a5f5baeabb
-rw-r--r-- 1 a270077 ab0246  15M Oct  9 10:05 2772efda3019a1630f1da30c53480b1c
-rw-r--r-- 1 a270077 ab0246 9.4M Oct  9 10:05 33f19ada24e79b8ba27a4507b20c903d
-rw-r--r-- 1 a270077 ab0246 2.5G Oct  9 10:03 6c66febdf523c0a30897ed4ee57edf7d
-rw-r--r-- 1 a270077 ab0246 2.5M Oct  9 10:05 9a5ff13448c0e86425d82e0855557725
-rw-r--r-- 1 a270077 ab0246 2.5G Oct  9 10:03 bf43304fee052e08aed9ef0402fe2791
-rw-r--r-- 1 a270077 ab0246 6.4K Oct  9 09:57 da6c58da3aa841659f1aced8e7dbc535
-rw-r--r-- 1 a270077 ab0246 2.5M Oct  9 10:05 e96bf03d4145ed875c25824a29cd8cc7

If you examine one of these cache files more closely:

$ jq . ./task-cache/1d33fdca8b1fb0b2f599d4a5f5baeabb
{
  "metadata": {
    "storage_key": "/work/ab0246/a270077/SciComp/Projects/pymorize/examples/task-cache/1d33fdca8b1fb0b2f599d4a5f5baeabb",
    "expiration": "2024-10-10T08:02:48.074715Z",
    "serializer": {
      "type": "pickle",
      "picklelib": "cloudpickle",
      "picklelib_version": null
    },
    "prefect_version": "3.0.1",
    "storage_block_id": null
  },
  "result": "gAVOLg==\n"
}

Ideally, we would build a CLI part that can interface with this cache, and maybe print out the Rule and Pipeline associated with the cache. I'll see if that can be built...

pgierz commented 4 days ago

The new filecache.py isn't passing the Flake8 test. @siligam, I guess you have a missing name somewhere which I fixed, but please look again to make sure your code still does what you think it should. I cleaned up a bit, please pull before you write docstrings.

pgierz commented 4 days ago

According to the dashboard, it also seems to be doing some kind of caching. Note the green "Cached" tag on the right pop-out display.

image