THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Apache License 2.0
5.11k stars 424 forks source link

AutoModelForSequenceClassification可能有bug #197

Closed fengyunflya closed 4 months ago

fengyunflya commented 4 months ago

System Info / 系統信息

在使用AutoModelForSequenceClassification时,model(**input)后报错runtimeError: mat1 and mat2 must have the same dtype, but got BFloat16 and Half。原来是model.classifier_head的类型是float16。另外结果输出的并不是做sentence classification,而是token classification。

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

Reproduction / 复现过程

  1. model = AutoModelForSequenceClassification.from_pretrained
  2. tokenizer = AutoTokenizer.from_pretrained
  3. inputs = tokenizer(["Hello, I love food. It is great"], return_tensors='pt')
  4. output = model.forward(**inputs)

Expected behavior / 期待表现

Just want to know why

fengyunflya commented 4 months ago

另外在加载模型后,print(model),显示最后有2层Linear Layer,但是dim对不上,所以应该是只有1层head layer。但不知道为什么打印出来是2层。

duzx16 commented 4 months ago

dtype 的问题已经 fix 了 (https://huggingface.co/THUDM/glm-4-9b-chat/commit/0deb1dd1717354614688b73617e63d444d942dbe) 结果输出是 token classification 而不是 sequence classification 的问题在之前的 commit 也修复了 (https://huggingface.co/THUDM/glm-4-9b-chat/commit/12c80499bc07e21c05c11ee2a6035371bf53f1a6)