Open daspartho opened 11 months ago
close #370
adds support for parallel sampling using vllm library when num_return_sequences in generation kwargs is > 1 and the model is supported by vllm (currently all hf models in llm-vm)
vllm
num_return_sequences
TODO: handle dependencies
made suggested changes. vllm_support is set to true by default and needs to be set false explicitly for unsupported models.
vllm_support
close #370
adds support for parallel sampling using
vllm
library whennum_return_sequences
in generation kwargs is > 1 and the model is supported by vllm (currently all hf models in llm-vm)TODO: handle dependencies