Open SparkJiao opened 11 months ago
Hi, since you were using a different LLM framework, could you kindly provide the inference speed of your LLM, e.g., how many tokens/second, as a reference? Besides, I noticed that you were using a chat model. Did you check the input and output of the LLM? Our prompts are based on standard LLMs (without instruction-finetuning), and I am not sure whether it would cause some unexpected behavior on chat LLMs.
Hi, appreciate to your wonderful work!
May I know the running time of your method, especially the one running on ProntoQA? Currently I'm running Llama-2-70b-chat-hf via vLLM (I deploy the model on 4 * A100-40G GPUs and the main process send request to get the output from the model). The program has already run for more than 10 hours but I only find 11 log files under the outptu directory.
So I open this issue to check your time and ensure if there are any potential problems?
Best, Fangkai