Open sundaraa-deshaw opened 11 months ago
configure a lower temperature, higher number of gpu layers, configure top_p and top_k for use with our custom fine tuned model
Hey - could you please share some actual use case that you aim to achieve by setting these parameters?
Please describe the feature you want
We have a use-case of running a fine-tuned model and would like to serve it from Tabby server. Since Tabby supports llama-cpp embedding and does not support http binding to other endpoinds (besides, fastchat and vertex-ai), we would want to be able to configure some of the model params as documented in https://github.com/ggerganov/llama.cpp/blob/019ba1dcd0c7775a5ac0f7442634a330eb0173cc/common/common.cpp#L1344
e.g. we would want to configure a lower temperature, higher number of gpu layers, configure top_p and top_k for use with our custom fine tuned model
Can we add this support please?
Please reply with a 👍 if you want this feature.