Numba cache issue for `get_numba_vector_to_list_of_spiketrain`

zm711 commented 7 months ago

hey guys,

so one of my labmates (I installed from source on her computer) ran into an issue during sw.plot_unit_summary trying to get_data from spike_amplitudes.

  Cell In[14], line 1
    amps = analyzer.get_extension('spike_amplitudes').get_data('by_unit')
  File ~\User\Documents\GitHub\spikeinterface\src\spikeinterface\core\sortinganalyzer.py:1610 in get_data
    return self._get_data(*args, **kwargs)
  File ~\User\Documents\GitHub\spikeinterface\src\spikeinterface\postprocessing\spike_amplitudes.py:137 in _get_data
    spike_indices = spike_vector_to_indices(spike_vector, unit_ids)
  File ~\User\Documents\GitHub\spikeinterface\src\spikeinterface\core\sorting_tools.py:79 in spike_vector_to_indices
    vector_to_list_of_spiketrain = get_numba_vector_to_list_of_spiketrain()
  File ~\User\GitHub\spikeinterface\src\spikeinterface\core\sorting_tools.py:111 in get_numba_vector_to_list_of_spiketrain
    @numba.jit((numba.int64[::1], numba.int64[::1], numba.int64), nopython=True, nogil=True, cache=True)
  File ~\.conda\envs\Spikesorting\Lib\site-packages\numba\core\decorators.py:236 in wrapper
    disp.compile(sig)
  File ~\.conda\envs\Spikesorting\Lib\site-packages\numba\core\dispatcher.py:964 in compile
    self._cache.save_overload(sig, cres)
  File ~\.conda\envs\Spikesorting\Lib\site-packages\numba\core\caching.py:652 in save_overload
    self._save_overload(sig, data)
  File ~\.conda\envs\Spikesorting\Lib\site-packages\numba\core\caching.py:662 in _save_overload
    self._cache_file.save(key, data)
  File ~\.conda\envs\Spikesorting\Lib\site-packages\numba\core\caching.py:478 in save
    self._save_index(overloads)
  File ~\.conda\envs\Spikesorting\Lib\site-packages\numba\core\caching.py:523 in _save_index
    with self._open_for_write(self._index_path) as f:
  File ~\.conda\envs\Spikesorting\Lib\contextlib.py:137 in __enter__
    return next(self.gen)
  File ~\.conda\envs\Spikesorting\Lib\site-packages\numba\core\caching.py:561 in _open_for_write
    with open(tmpname, "wb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\User\\Documents\\GitHub\\spikeinterface\\src\\spikeinterface\\core\\__pycache__\\sorting_tools.get_numba_vector_to_list_of_spiketrain.locals.vector_to_list_of_spiketrain_numba-111.py311.nbi.tmp.29dcb473098a4732'

Any idea what's going on? I tried to do the extension myself:

spike_amplitudes = analyzer.get_extension('spike_amplitudes').get_data()

is fine, but doing the get_data('by_unit') fails.

samuelgarcia commented 7 months ago

Oups. This looks like a numba problem. Does other numba code are working on this machine ? For instance detect_peaks with locally_exclussive ?

zm711 commented 7 months ago

She was able to calculate correlograms (which should use a numba implementation if available) and it worked fine. I could have her try detect_peaks when she gets into lab. I'm just wondering if there is some sort of cache invalidation happening for numba. Not sure though.

EDIT: I did try emptying the pycache and restarting the kernel to see if generating a new cache for the files would fix it and the same error happened.

h-mayorquin commented 7 months ago

I advocate again for no using cache=True in library code. I think that feature is designed for personal scripts.

@DradeAW

zm711 commented 7 months ago

When I get a chance I'll test setting cache=False and see if that fixes it. If it does we may want to switch the codebase over to cache=False for numba stuff.

DradeAW commented 7 months ago

I'm fine with cache=False if it solves issues that people encounter :)

h-mayorquin commented 7 months ago

Maybe another of the things to add to the global config though.

samuelgarcia commented 7 months ago

I am not exactly sure what is cache=False does it means that it is recompiled at each call ?

zm711 commented 7 months ago

Not at each call I think at each session. The docs says "To avoid compilation times each time you invoke a Python program, you can instruct Numba to write the result of function compilation into a file-based cache."

But it says invalidation can be caused by:

Cache invalidation fails to recognize changes in functions defined in a different file. This means that when a main jit function calls functions that were imported from a different module, a change in those other modules will not be detected and the cache will not be updated. This carries the risk that “old” function code might be used in the calculations.

numba docs

h-mayorquin commented 7 months ago

[EDIT] I see that @zm711 already answered.

samuelgarcia commented 7 months ago

thanks you very much.

SpikeInterface / spikeinterface

Numba cache issue for `get_numba_vector_to_list_of_spiketrain` #2658