IlyaGusev / rulm

Language modeling and instruction tuning for Russian
Apache License 2.0
455 stars 50 forks source link

Is there any way to increase speed? #18

Closed NikolayTV closed 1 year ago

NikolayTV commented 1 year ago

Hi. Im wondering if there is any parameter, or hack to increase speed? On what it depends? mb max_token_size or smthg else

Andrew-MK commented 1 year ago

Hi! We are talking about CPU or GPU or Metal inference speed, windows or linux or mac os? It depends on many things: pc memory speed and number of threads and openBLAS or GPU offload, CUDA driver versions etc... Make a test script and try various possible options on your specific system and compare the execution time.

IlyaGusev commented 1 year ago

Also there are several inference frameworks, such as https://github.com/vllm-project/vllm or https://github.com/ggerganov/llama.cpp. They should be more effective than plain HF.