brain-score / vision

A framework for evaluating models on their alignment to brain and behavioral measurements (50+ benchmarks)
http://brain-score.org
MIT License
115 stars 69 forks source link

Error when checking models, OSError: [Errno 5] Input/output error #620

Open YudiXie opened 4 months ago

YudiXie commented 4 months ago

I am trying to submit a model following the tutorial here: https://www.brain-score.org/tutorial/deepdive_2

However, when I run the local check check_models.check_base_models(__name__) I got the following error.

[yu_xie@node091 yudixie_resnet50_imagenet1kpret_0_240222]$ python model.py
/om/user/yu_xie/miniconda3/envs/brainscore/lib/python3.7/site-packages/brainscore_core/metrics/__init__.py:16: FutureWarning: xarray subclass Score should explicitly define __slots__
  class Score(DataAssembly):
Loaded model from yudixie_resnet50_imagenet1kpret_0_240222_weights.pth
Loaded model from yudixie_resnet50_imagenet1kpret_0_240222_weights.pth
Traceback (most recent call last):
  File "model.py", line 67, in <module>
    check_models.check_base_models(__name__)
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/model_helpers/check_submission/check_models.py", line 38, in check_base_models
    check_processing(model, module)
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/model_helpers/check_submission/check_models.py", line 46, in check_processing
    benchmark = _MockBenchmark()
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/model_helpers/check_submission/check_models.py", line 61, in __init__
    assembly_repetition = load_dataset("MajajHong2015.public").sel(region="IT").squeeze("time_bin")
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/__init__.py", line 34, in load_dataset
    return data_registry[identifier]()
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/data/majajhong2015/__init__.py", line 55, in <lambda>
    stimulus_set_loader=lambda: brainscore_vision.load_stimulus_set('hvm-public'),
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/data_helpers/s3.py", line 42, in load_assembly_from_s3
    stimulus_set = stimulus_set_loader()
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/data/majajhong2015/__init__.py", line 55, in <lambda>
    stimulus_set_loader=lambda: brainscore_vision.load_stimulus_set('hvm-public'),
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/__init__.py", line 41, in load_stimulus_set
    return stimulus_set_registry[identifier]()
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/data/majajhong2015/__init__.py", line 104, in <lambda>
    zip_version_id="UzgkNOtIWMXaMN7vUA0FVemXLTvdtH13")
  File "/weka/scratch/weka/dicarlo/yu_xie/projects/brain-score/brainscore_vision/data_helpers/s3.py", line 67, in load_stimulus_set_from_s3
    stimuli_paths = [Path(stimuli_directory) / local_path for local_path in os.listdir(stimuli_directory)
OSError: [Errno 5] Input/output error: '/home/yu_xie/.brainio/image_dicarlo_hvm-public'
mschrimpf commented 4 months ago

This seems to be an OS error from the call to os.listdir. Is that directory somehow exceedingly large? (maybe related: https://stackoverflow.com/q/66747762/2225200)

YudiXie commented 4 months ago

yes the directory /home/yu_xie/.brainio/image_dicarlo_hvm-public seems to contain many files, many images, for example

z3_rx-38.767_ry-29.304_rz-03.405_tx+00.294_ty-00.342_s+01.207_06025532e5d0ffc1e95d4e956294da17584d5d3f_256x256.png
z3_rx+39.090_ry-44.396_rz-22.201_tx+00.237_ty+00.438_s+00.885_e1e5401cbfcb8ba600d202d77f4e60421b98cfde_256x256.png
z3_rx+39.231_ry-24.097_rz-29.962_tx+00.139_ty+00.485_s+01.086_1da19c2c0a85bda62beac5fdea7abe8e39b4ba5a_256x256.png
z3_rx+40.684_ry-38.233_rz-03.629_tx-00.074_ty-00.035_s+00.911_a6a28efca9fe4378d8ab241c6c288603ffcb6240_256x256.png
z3_rx-43.545_ry+41.730_rz-09.900_tx-00.282_ty+00.442_s+00.865_fe38da31edacbcb506287d9252d1de8729d3f178_256x256.png
z3_rx-44.063_ry-01.006_rz+32.997_tx-00.073_ty-00.048_s+00.894_991e13b6fc487fd03e5f742f5d174cd2b42df8e0_256x256.png
z3_rx-44.073_ry-31.783_rz-12.980_tx+00.067_ty+00.022_s+00.789_255c6c6249a96d9eb2b6e4aca676931470b7b4ef_256x256.png
mike-ferguson commented 4 months ago

This can also occur (according to some other internet sources) if either memory or disk space has been exceeded. Are you maxing out a storage quota anywhere?

YudiXie commented 4 months ago

I just checked, and I have ample space on the disk

mike-ferguson commented 4 months ago

Gotcha- do you have read/write permissions for /home/yu_xie/.brainio/image_dicarlo_hvm-public?

@mschrimpf is this indicative of something we need to change in load_stimulus_set_from_s3?

YudiXie commented 4 months ago

Yes, I have. If this information helps: /home/yu_xie/.brainio is a symbolic link to a directory on om. I use this because openmind has a small home quota.

mschrimpf commented 4 months ago

we have used this function for the last years without a problem, so my guess would be that this is random OpenMind shenanigans. FWIW I did sometimes see similar OS errors (outside of Brain-Score for other code) with a similar symlink setup

YudiXie commented 4 months ago

I tried to test this multiple times on openmind at different times with different compute node during the day, and this error persists. Have people been able to run check_models.check_base_models(__name__) successfully on openmind?

mike-ferguson commented 4 months ago

We have had test runs on Openmind of the check_models function, but I will confirm this again

mike-ferguson commented 1 month ago

@YudiXie Just wanted to follow up on this