THUDM / GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Apache License 2.0
7.65k stars 607 forks source link

[Disscussion] Can we align GLM-130B to human like chatgpt? #43

Open AnShengqiang opened 1 year ago

Xiao9905 commented 1 year ago

Certainly. The alignment for GLM-130B could be important, and we are on preliminary surveying.

conceptofmind commented 1 year ago

You could use the current glm-10b on huggingface with trl/trlx to construct a model with rlhf.

smeyerhot commented 1 year ago

What is trl/trlx? I am very interested in this use case. Why must the 10-b parameter model be used for rlhf?

smeyerhot commented 1 year ago

I am actively working on this task and would be very interested in further development coordination.

conceptofmind commented 1 year ago

@smeyerhot Trl is Transformer Reinforcement Learning a library built by Huggingface for training language models with PPO. Trlx is an extension of Trl built by CarperAI. Both cover the same use-case for training models using reinforcement learning with human feedback. You can also build the same functionality with actor-critic ppo in PyTorch although it would require more extensive domain knowledge. You do not have to use glm-10b but it is publicly available on Huggingface's model hub unlike 130b which requires you to apply for access. You can use any encoder-decoder or decoder-only model. We are on an issue relating to GLM for aligning human feedback with the model which is why I suggested using the 10b parameter one.

Syno8 commented 1 year ago

chatgpt can generate the format text and image. this need to keep the pertaining data in original format

beautifull4frank commented 1 year ago

hi gays, I use bloom to implement ppo successfully. image But I found the Bloom model use the AutoModelForCausalLM function. image

however, the glm is using the AutoModelForSeq2SeqLM function. image there is no LM in AutoModelForSeq2SeqLM model, so do u know how to correct ?