ezelikman / quiet-star

Code for Quiet-STaR
https://arxiv.org/abs/2403.09629
Apache License 2.0
392 stars 57 forks source link

Add infer code #3

Open ostix360 opened 5 months ago

ostix360 commented 5 months ago

This PR add a file that contains the minimal code to infer the model with a consistent output.

This seems very slow to infer 100 tokens but output a consistent output

What do you think?

samuelazran commented 5 months ago

This PR add a file that contains the minimal code to infer the model with a consistent output.

This seems very slow to infer 100 tokens but output a consistent output

What do you think?

Hi, thanks for your contribution! I will test it. When you say "slow", is it in comparison to generating of the same amount of tokens with the base model? did you add the amount of "thoughts" tokens in the comparison?

ostix360 commented 5 months ago
My tests are based on a 4070 ti that's why there is load_in_8bits=True in the script (to fit the model in the 12Go VRAM) Of course the model mistral 7 (for the roiginal model I take mistral instruct) model quiet star original model
token generated 400 50
usefull token 50 50
time to generate (s) 1055 17
token per second 0.38 2.94
second per token 2.64 0.34

edit: This big difference between the two generation speeds may be due to the context storage in memory.

Some-random commented 5 months ago

Hi @ostix360, thank you so much for the contribution! I've run your infer code but the output doesn't make much sense to me... Can you explain it a bit more?

image
Some-random commented 5 months ago
image

This is the whole output, it looks even weirder...