THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
40.62k stars 5.21k forks source link

[Help] 如何获取embedding层表达 #125

Closed 1991Troy closed 1 year ago

1991Troy commented 1 year ago

Is there an existing issue for this?

Current Behavior

chatgpt可以通过 openai.Embedding.create(model=model, input=text)获取embedding层表达,请问通过huggingface调用GLM模型应该如何获取embedding层表达? 后续如继续开发chatpdf的话,embedding信息还挺有用的。 谢谢

Expected Behavior

No response

Steps To Reproduce

help

Environment

- OS:Ubuntu 20.04
- Python:3.8
- Transformers: 4.26.1
- PyTorch:1.12
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No

huangjiaheng commented 1 year ago

同问

zhongtao93 commented 1 year ago

同问

stallboy commented 1 year ago

同问

zlszhonglongshen commented 1 year ago

同问

georgechen1827 commented 1 year ago

我这边尝试了一下,直接用的话感觉效果并不是很好:ChatGLM-text-embedding

zhangch9 commented 1 year ago

暂时没有直接获取Embedding的API。 目前可以通过设置output_hidden_states=True获取隐层表示,可参考以下代码:

def get_hidden_states(
    text: str, model: PreTrainedModel, tokenizer: PreTrainedModel
) -> Optional[Tuple[torch.Tensor]]:
    model = model.eval()
    inputs = tokenizer([text], return_tensors='pt').to(model.device)
    out = model(**inputs, output_hidden_states=True)
    return out.hidden_states