Evaluating autoregressive models

Holmes-Benchmark / holmes-evaluation

4 stars 1 forks source link

Evaluating autoregressive models #4

Closed bbunzeck closed 1 month ago

bbunzeck commented 1 month ago

Hi, when I try to use holmes with a GPT-2 or Llama model, I get the following error:

python investigate.py --model_name 'bbunzeck/gpt-wee-regular' --version holmes --parallel_probing --cuda_visible_devices 0 --> error2.txt

Thank you very much in advance!

holmesbenchmark commented 1 month ago

Thanks for posting this issue. We shortly checked your model and it seems to work in our environment. Could you post your install packages along with the version? For example using pip list -v.

bbunzeck commented 1 month ago

Hello, my mistake! I chose the wrong virtual environment, no everything is working fine. Thanks for the timely answer!

bbunzeck commented 1 month ago

Good evening! GPT-2 evaluation works just fine, but with llama models (e.g. bbunzeck/baby_llama on the 🤗 model hub) I get the following error: TypeError: LlamaModel.forward() got an unexpected keyword argument 'token_type_ids'. I have installed all requirements as specified into a clean virtual environment, but I am not sure how to troubleshoot this mistake. Any help would be appreciated!

holmesbenchmark commented 1 month ago

Hi @bbunzeck,

Could try it again? It seems that for some reason the tokenizer returns token_type_ids. We now remove this entry if it exists for a LlamaModel.

Thanks for you feedback!

bbunzeck commented 1 month ago

Thank you, it is working now!