SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
531 stars 188 forks source link

Spikeinterface cannot find singularity instance #2062

Closed ablot closed 1 year ago

ablot commented 1 year ago

Since I upgrade to spikeinterface >0.9 I struggle with singularity. After I fix the binding issues (#2059), I have another crash that I don't really understand:

spikeinterface.sorters.utils.misc.SpikeSortingError: Spike sorting in singularity failed with the following error:│                                                                                            
FATAL:   no instance found with name outstanding_leg_0307  

I assume that it is an issue with the ContainerClient class but I'm not sure where to look. Do you have any idea?

JoeZiminski commented 1 year ago

Does it say anywhere in the stracktrace something about Loop devices? I think (though am not sure) this looks similar to what I get with that error. In this case exiting images instances can be closed with

singularity instance list
(see list of names of existing instances)
singularity instance stop <instance name>
alejoe91 commented 1 year ago

You can also try to clean the cache: https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/sorters/external/tests/test_singularity_containers.py#L22

ablot commented 1 year ago

So far no luck with either of those. I'll try to understand the issue more at the end of the week

Is there a minimum version of singularity that is compatible? (ours is a bit old 3.6.4)

alejoe91 commented 1 year ago

We're using version 3.8.7 in our tests: https://github.com/SpikeInterface/spikeinterface/blob/main/.github/workflows/test_containers_singularity.yml

ablot commented 1 year ago

I'm actually unsure if this is a binding issue or not. This error happens only if I export SPIKEINTERFACE_DEV_PATH and in there is ERROR ['FATAL: while parsing bind path: while getting bind path: is not a valid bind option\n'] : return code 255 in the traceback (before the "no instance found")

I've checked what the volumes dictionnary was before starting the container and it looks fine to me:

{
    "/nemo/lab/znamenskiyp/data/instruments/raw_data/projects/blota_onix_pilote/BRYA142.5d/S20231002/R121114_onix": {
        "bind": "/nemo/lab/znamenskiyp/data/instruments/raw_data/projects/blota_onix_pilote/BRYA142.5d/S20231002/R121114_onix",
        "mode": "ro"
    },
    "/nemo/lab/znamenskiyp/home/shared/projects/blota_onix_pilote/BRYA142.5d/S20231002/R121114_onix/verboseTrue_devTrue": {
        "bind": "/nemo/lab/znamenskiyp/home/shared/projects/blota_onix_pilote/BRYA142.5d/S20231002/R121114_onix/verboseTrue_devTrue",
        "mode": "rw"
    },
    "/nemo/lab/znamenskiyp/home/users/blota/code/spikeinterface": {
        "bind": "/nemo/lab/znamenskiyp/home/users/blota/code/spikeinterface",
        "mode": "ro"
    }
}

I don't have that issue if I don't export the environment variable. I'll switch to the non-dev for now

ablot commented 1 year ago

I'm not making much progress. I now have

Traceback (most recent call last):
  File "/nemo/lab/znamenskiyp/home/users/blota/code/spikeinterface/test_container_mini.py", line 7, in <module>
    sorting = ss.run_sorter(
  File "/nemo/lab/znamenskiyp/home/users/blota/code/spikeinterface/src/spikeinterface/sorters/runsorter.py", line 142, in run_sorter
    return run_sorter_container(
  File "/nemo/lab/znamenskiyp/home/users/blota/code/spikeinterface/src/spikeinterface/sorters/runsorter.py", line 530, in run_sorter_container
    container_client = ContainerClient(
  File "/nemo/lab/znamenskiyp/home/users/blota/code/spikeinterface/src/spikeinterface/sorters/runsorter.py", line 300, in __init__
    raise FileNotFoundError(
FileNotFoundError: Unable to locate container image spikeinterface/kilosort3-compiled-base

The output starts with:

Singularity: pulling image spikeinterface/kilosort3-compiled-base
singularity pull --name kilosort3-compiled-base.sif docker://spikeinterface/kilosort3-compiled-base
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob sha256:47c7644723910b6dfc6ec8b3bd9fed3ac32778cf485ce3a6535ff6b6da06f743
Copying blob sha256:85aaf046f0365a57a54dc3f66ba5dfa79e928e885b0705214fe1b5b3ce148438
Copying blob sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
panic: page 3 already freed

Do you know what is the panic about?

alejoe91 commented 1 year ago

Not sure honestly! :(

ablot commented 1 year ago

This was a cache issue. My system was using .local/share/containers/cache/ which apparently is not emptied by singularity cache clean as it could be used by other platforms.