Closed dwadden closed 2 months ago
My understanding is that vLLM integration of OLMo was at some point being looked at by @AkshitaB , although not sure on the current setup. As for the second, feel free to add a subset flag for this, since it might be something useful for debugging anyway. Just reducing the prompt set used should work naively with the existing code (I do this myself for debugging).
@dwadden vllm supports OLMo in their latest version already, you should be able to use it directly. You'll need to convert the olmo checkpoint to HF format using the conversion script.
Also, make sure to use the latest vllm, since they fixed a bug with tensor parallel case in this commit after their last pip release.
thanks akshita!!!
Alpaca-eval on OLMo models is very slow -- maybe just because OLMo can't use vllm and huggingface in general is slow? Here's an example Beaker job; based on the TQDM log it will take 100 hours (~4 days) to evaluate 800 examples. This isn't really a workable solution. Potential options:
@hamishivi @yizhongw any thoughts?