Add infer code - Githubissues

ezelikman / quiet-star

Code for Quiet-STaR

https://arxiv.org/abs/2403.09629

Apache License 2.0

392 stars 57 forks source link

Add infer code #3

Open ostix360 opened 5 months ago

ostix360 commented 5 months ago

This PR add a file that contains the minimal code to infer the model with a consistent output.

This seems very slow to infer 100 tokens but output a consistent output

What do you think?

samuelazran commented 5 months ago

This PR add a file that contains the minimal code to infer the model with a consistent output.

This seems very slow to infer 100 tokens but output a consistent output

What do you think?

Hi, thanks for your contribution! I will test it. When you say "slow", is it in comparison to generating of the same amount of tokens with the base model? did you add the amount of "thoughts" tokens in the comparison?

ostix360 commented 5 months ago

My tests are based on a 4070 ti that's why there is load_in_8bits=True in the script (to fit the model in the 12Go VRAM) Of course the model mistral 7 (for the roiginal model I take mistral instruct)	model	quiet star
token generated	400	50
usefull token	50	50
time to generate (s)	1055	17
token per second	0.38	2.94
second per token	2.64	0.34

edit: This big difference between the two generation speeds may be due to the context storage in memory.

Some-random commented 5 months ago

Hi @ostix360, thank you so much for the contribution! I've run your infer code but the output doesn't make much sense to me... Can you explain it a bit more?

Some-random commented 5 months ago

This is the whole output, it looks even weirder...