brain-score / model-tools

Helper functions to extract model activations and translate from Machine Learning to Neuroscience
MIT License
8 stars 27 forks source link

FileNotFoundError with Majaj V4 but not Majaj IT #66

Closed RylanSchaeffer closed 1 year ago

RylanSchaeffer commented 1 year ago

I have a script which fits models against Majaj IT and Majaj V4. IT runs fine, but when I try specifying V4 instead, I receive the following stack trace and error:

  File "python3.7/site-packages/model_tools/activations/core.py", line 79, in _from_paths_stored
    return self._from_paths(layers=layers, stimuli_paths=stimuli_paths)
  File "python3.7/site-packages/model_tools/activations/core.py", line 85, in _from_paths
    layer_activations = self._get_activations_batched(stimuli_paths, layers=layers, batch_size=self._batch_size)
  File "python3.7/site-packages/model_tools/activations/core.py", line 135, in _get_activations_batched
    batch_activations = hook(batch_activations)
  File "python3.7/site-packages/model_tools/activations/pca.py", line 23, in __call__
    self._ensure_initialized(batch_activations.keys())
  File "python3.7/site-packages/model_tools/activations/pca.py", line 40, in _ensure_initialized
    n_components=self._n_components)
  File "python3.7/site-packages/result_caching/__init__.py", line 231, in wrapper
    self.save(result, function_identifier)
  File "python3.7/site-packages/result_caching/__init__.py", line 125, in save
    os.rename(savepath_part, path)
FileNotFoundError: [Errno 2] No such file or directory: '/om2/user/rylansch/FieteLab-Reg-Eff-Dim/.result_caching/model_tools.activations.pca.LayerPCA._pcas/identifier=architecture:RF-100-cosine-bernoulli-b-ns|task:None|kind:Rand|source:RS|lyr:mlp|agg:pca|n_comp:1000,n_components=1000.pkl.filepart' -> '/om2/user/rylansch/FieteLab-Reg-Eff-Dim/.result_caching/model_tools.activations.pca.LayerPCA._pcas/identifier=architecture:RF-100-cosine-bernoulli-b-ns|task:None|kind:Rand|source:RS|lyr:mlp|agg:pca|n_comp:1000,n_components=1000.pkl'

I'm not familiar with result_caching. Could someone please help me understand why this problem emerges for V4 but not IT? What's the solution to fixing it?

RylanSchaeffer commented 1 year ago

I rm -rf * my .result_caching path and the error is gone. Bizarre :thinking:

RylanSchaeffer commented 1 year ago

The error is back :/

mschrimpf commented 1 year ago

The result_caching library caches results so you don't have to rerun costly computations too many times. My guess is that the architecture identifier architecture:RF-100-cosine-bernoulli-b-ns|task:None|kind:Rand|source:RS|lyr:mlp|agg:pca|n_comp:1000 is not properly written to disk or it has trouble reading the file. I don't think this has anything to do with V4 vs IT but rather the order of execution.

You can disable result_caching altogether by running with RESULTCACHING_DISABLE=1 or only specific components, e.g. RESULTCACHING_DISABLE=model_tools.activations or chance the model identifier.

RylanSchaeffer commented 1 year ago

I opened another issue on results_caching (see https://github.com/brain-score/result_caching/issues/16). It seems the errors appear with low probability if many SLURM jobs are running simultaneously.