caracal-pipeline / stimela

Stimela 2.0
GNU General Public License v2.0
5 stars 3 forks source link

Add option to set environment variables when invoking singularity #334

Open landmanbester opened 2 weeks ago

landmanbester commented 2 weeks ago

We have this option for the kube backend but not for singularity. It is useful for things like caches. In my case I ran into

#   File "/usr/local/lib/python3.9/dist-packages/numba/core/caching.py", line 540, in _save_data
#     f.write(data)
# OSError: [Errno 28] No space left on device

which I am assuming happens because numba is caching directly to the singularity image and this must have some sort of limit set? The workaround I am testing is to merge in a separate config which sets

cabs:
  cab.name:
    management:
      NUMBA_CACHE_DIR: /mounted/path/to/numba_cache

Is this the correct thing to do?

o-smirnov commented 2 weeks ago

Yep, and you'll also need to add it to the bind paths in the singularity backend options. Just as a workaround for now.

My proposed sustainable solution:

Then these sorts of things could be available across all backends uniformly.

landmanbester commented 2 weeks ago

I tried the above but it doesn't seem like the environment variable is getting passed through correctly. If I look at the dumped config file I see following for the backend settings

opts:
  backend:
    default_registry: quay.io/stimela2
    override_registries: {}
    select: singularity
    singularity:
      enable: true
      image_dir: /home/bester/.singularity
      auto_build: true
      rebuild: true
      executable: null
      remote_only: false
      bind_dirs:
        /home/bester/projects/ESO137: rw

If I look at the cab that gets invoked I also see the following under the management section

    management:
      environment:
        NUMBA_CACHE_DIR: /home/bester/projects/ESO137/numba_cache
      cleanup: {}
      wranglers: {}

But printing the numba cache dir from inside a worker produces

#               Numba cache =

which means it hasn't been set. I guess I could dive into the singularity backend but wouldn't even be sure what to look for. Any ideas?

JSKenyon commented 2 weeks ago

Hilariously, I ran into this during a pipeline run yesterday. I have implemented a very basic fix in the isse334-basic-fix branch. This adds env to the SingularityBackendOptions, and it functions in the same way as the kube backend. This is what it looks like in my recipe:

selfcal-1:
  info: |
    Use quartical to perform basic selfcal. Solves for a delay and phase
    term per scan. Note that the selfcal step may require tuning based
    on the field and instrument in question.
  _use: lib.steps.quartical.k
  backend:
    singularity:
      bind_dirs:
        /home/kenyon/numba_cache_dir: rw
      env:
        NUMBA_CACHE_DIR: /home/kenyon/numba_cache_dir
  params:
    K.time_interval: 4

I appreciate that this may not be the best solution, but it is a simple one which can be used to check that this is the root cause of the original error.

In principle, all backends should likely support an env parameter. The environment field in the cab management section doesn't seem to actually be used anywhere inside stimela at present. I would argue that it feels more natural to set these as part of the backend settings than by modifying the cab.

Edit: I am still rerunning the recipe to see if this has solved the problem.

JSKenyon commented 2 weeks ago

The above also has the advantage of being easily configurable for both the entire recipe and per step.

JSKenyon commented 2 weeks ago

This does seem to have fixed the issue for me (assuming it isn't intermittent).

landmanbester commented 2 weeks ago

Awesome, this seems to have fixed the problem for me. Thanks @JSKenyon

o-smirnov commented 1 week ago

The environment field in the cab management section doesn't seem to actually be used anywhere inside stimela at present. I would argue that it feels more natural to set these as part of the backend settings than by modifying the cab.

Yeah the management: environment field was inherited from old Stimela but not yet implemented. Off the top of my head, I do see three categories of environment variables: