About inference time - Githubissues

Ber666 / RAP

Reasoning with Language Model is Planning with World Model

https://arxiv.org/abs/2305.14992

MIT License

138 stars 16 forks source link

About inference time #8

Open SparkJiao opened 11 months ago

SparkJiao commented 11 months ago

Hi, appreciate to your wonderful work!

May I know the running time of your method, especially the one running on ProntoQA? Currently I'm running Llama-2-70b-chat-hf via vLLM (I deploy the model on 4 * A100-40G GPUs and the main process send request to get the output from the model). The program has already run for more than 10 hours but I only find 11 log files under the outptu directory.

So I open this issue to check your time and ensure if there are any potential problems?

Best, Fangkai

Ber666 commented 10 months ago

Hi, since you were using a different LLM framework, could you kindly provide the inference speed of your LLM, e.g., how many tokens/second, as a reference? Besides, I noticed that you were using a chat model. Did you check the input and output of the LLM? Our prompts are based on standard LLMs (without instruction-finetuning), and I am not sure whether it would cause some unexpected behavior on chat LLMs.