Pipeline/llm - Githubissues

deep-diver commented 8 months ago

This PR was tested with both

TGI(on Hugging Face Inference w/ PRO account)
locally running Hugging Face model(w/ transformers on M3 Macbook Pro Max)

Some changes are for introducing Async feature. The next step is to build Gradio app.

deep-diver commented 8 months ago

@sayakpaul

I found it is difficult to limit the input tokens < max-input-token-length:

For both TGI and locally running models, we can count the number of input tokens and trim them down < max-input-token-length just like how you did from the previous notebook.
However, it loses some important information. For instance, <|system|>, <|user|>, and <|assistant|> special tokens gives the model signals. With naively trimming down [-max-input-token-length:], we lose <|system|> part.
Even if we keep <|system|> part and trim the rest, we don't want to lose <|user|> and <|assistant|> special tokens as well. Meaning, the ideal final output should be something like below:

<|system|>
ALWAYS KEEP THIS PART
<|user|>
.....MIGHT NEED TO BE TRIMMED WITHIN <|user|> section, but we should keep <|user|> special token.
<|assistant|>
.....
<|user|>
WHATEVER USER's say

The main obstacle is to keep special tokens, and trim down the actual contents inside each special tokens (possible just remove the whole thing if the len(content) becomes zero after trimming.

Furthermore, even if we have successfully manage this trimming process, the problem is how we can convert the trimmed prompts back to the message format such as [{"role": ..., "content": ...}, ...]. This is necessary since OpenAI SDK only allows us to input the messages in such format.
- this issue introduces another issue that tokenized text now is a sort of raw text which includes different special tokens from model to model. Like, for HuggingFaceH4/zephyr-7b-beta, it has <|system|>, <|user|>, and <|assistant|> special tokens. However, for NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO, it has only <|im_start|> special token. etc.,

What do you think? I mean we don't need to tackle this issue within this project, but wanted to discuss about what I have found so far. If you already know the solution for this, please let me know!

sayakpaul commented 8 months ago

Do you want to post it on the fellows channel and tag me?

deep-diver / Vid2Persona

Pipeline/llm #6