brain-score / vision

A framework for evaluating models on their alignment to brain and behavioral measurements (50+ benchmarks)
http://brain-score.org
MIT License
119 stars 75 forks source link

clarifying model format for engineering scores #645

Open YudiXie opened 6 months ago

YudiXie commented 6 months ago

With Mike's help, I found that my model submitted to Brain-Score will generate the following errors after submission.

Mike told me that this error happened because, when doing engineering scoring, brain-score assumes the last layer of the model has 1000 output units corresponding to the 1000 ImageNet classes. I did not know about this before submitting because I cannot find on the tutorial or on github that this is explicitly said.

Here are some things that I think could address this issue:

  1. Make it clear on Github readme file, or on the website that, to get engineering scores, the user has to submit models with 1000 output units corresponding to the 1000 ImageNet classes.
  2. Make it clear how to specify which layer is used (considered logits) for these engineering scores (it seems that currently the behavioral layer is used but I am not sure).
    • Sometimes, the user may have a model that is partially trained on ImageNet. For example, the model might have 1010 output units, and the first 1000 correspond to the imagenet logits. It would be great if there is a way for the user to specify to use the first 1000 units out of 1010 units in this layer for engineering scores. (though this is a minor suggestion that is better to have)

It is often the case that the users are not training the model just on ImageNet, so the ideal case could be:

  1. it would be great if brain-score could train a decoder based on the feature of the network activations, and use that decoder to evaluate on the engineering benchmarks (like ImageNet classification). In that case, that would work for any model without requiring specific output.

But I understand there might be other considerations that make this not feasible at the moment.

Traceback (most recent call last):
  File "/scratch2/weka/quest/shared/jenkins/miniconda3/envs/yudixie_resnet50_imagenet1kpret_0_240312_Geirhos2021sketch-error_consistency/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/scratch2/weka/quest/shared/jenkins/miniconda3/envs/yudixie_resnet50_imagenet1kpret_0_240312_Geirhos2021sketch-error_consistency/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "brainscore_vision/__main__.py", line 20, in <module>
    fire.Fire()
  File "/scratch2/weka/quest/shared/jenkins/miniconda3/envs/yudixie_resnet50_imagenet1kpret_0_240312_Geirhos2021sketch-error_consistency/lib/python3.7/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/scratch2/weka/quest/shared/jenkins/miniconda3/envs/yudixie_resnet50_imagenet1kpret_0_240312_Geirhos2021sketch-error_consistency/lib/python3.7/site-packages/fire/core.py", line 482, in _Fire
    target=component.__name__)
  File "/scratch2/weka/quest/shared/jenkins/miniconda3/envs/yudixie_resnet50_imagenet1kpret_0_240312_Geirhos2021sketch-error_consistency/lib/python3.7/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "brainscore_vision/__main__.py", line 9, in score
    result = _score_function(model_identifier, benchmark_identifier, conda_active)
  File "/rdma/vast-rdma/scratch/Sun/score_plugins_vision_env_163/vision/brainscore_vision/__init__.py", line 105, in score
    score_function=_run_score, conda_active=conda_active)
  File "/scratch2/weka/quest/shared/jenkins/miniconda3/envs/yudixie_resnet50_imagenet1kpret_0_240312_Geirhos2021sketch-error_consistency/lib/python3.7/site-packages/brainscore_core/plugin_management/conda_score.py", line 88, in wrap_score
    result = score_function(model_identifier, benchmark_identifier)
  File "/rdma/vast-rdma/scratch/Sun/score_plugins_vision_env_163/vision/brainscore_vision/__init__.py", line 76, in _run_score
    score: Score = benchmark(model)
  File "/rdma/vast-rdma/scratch/Sun/score_plugins_vision_env_163/vision/brainscore_vision/benchmarks/geirhos2021/benchmark.py", line 75, in __call__
    labels = candidate.look_at(stimulus_set, number_of_trials=self._number_of_trials)
  File "/rdma/vast-rdma/scratch/Sun/score_plugins_vision_env_163/vision/brainscore_vision/model_helpers/brain_transformation/__init__.py", line 67, in look_at
    return self.behavior_model.look_at(stimuli, number_of_trials=number_of_trials)
  File "/rdma/vast-rdma/scratch/Sun/score_plugins_vision_env_163/vision/brainscore_vision/model_helpers/brain_transformation/behavior.py", line 27, in look_at
    return self.current_executor.look_at(stimuli, *args, **kwargs)
  File "/rdma/vast-rdma/scratch/Sun/score_plugins_vision_env_163/vision/brainscore_vision/model_helpers/brain_transformation/behavior.py", line 49, in look_at
    choices = self.logits_to_choice(logits)
  File "/rdma/vast-rdma/scratch/Sun/score_plugins_vision_env_163/vision/brainscore_vision/model_helpers/brain_transformation/behavior.py", line 53, in logits_to_choice
    assert len(logits['neuroid']) == 1000
AssertionError
mike-ferguson commented 6 months ago

PR 259 in Brain-Score Web here to a address this here