flowersteam / lamorel

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
MIT License
190 stars 18 forks source link

A few questions & suggestions on `HF_LLM.py` #39

Open sadra-barikbin opened 7 months ago

sadra-barikbin commented 7 months ago

Hi there! I had a few questions & suggestions regarding HF_LLM.py. Sorry in advance for being naive.

ClementRomac commented 7 months ago

Hey,

For decoder-only and pre_encode_input=True case, the tokenized_contexts=batch_inputs argument to custom module calls lacks the last token of the context which has been prepended to the output. Is this really desired? Couldn't be confusing for the user?

Yes, when using pre_encode_input=True with decoder-only models, the input is first given to the LLM and the past_key_values is obtained. However, transformers only returns the hidden_states for inputs, not for the past_key_values. So to get the hidden states from the input's last token, this token should be removed from the inputs used when pre-encoding. I agree that this may be confusing. This is among a long list of things that aren't documented. I unfortunately have a very limited bandwith to work on improving the documentation :/

Why don't we use bos_token here while creating decoder input in encoder-decoder setup? The bos_token is not provided for all tokens. When I first implemented the hf_llm.py, I was mostly using T5 models which do not have any bos_token. This said, I had to force the pad token to 0 as a pad_token is also not implemented for all models... I don't know if there is any clean universal solution. Nevertheless, it seems the token 0 is often used for padding.

  • [ ] To batchify generation:

Yes, this needs to be done :)

  • [ ] We could add output_hidden_states not as a tensor:

Right. We would have to check the transformers API handles a boolean.

  • [ ] When pretrained=False and use_cpu=False, HF_LLM.__init__ raises error in:

I need to test this. I am not sure when I can do it though.