Is your feature request related to a problem? If so, please describe.
There might be different parameters a user can tune to deploy their model on the runtime, e.g., TRANSFORMERS_CACHE, FLASH_ATTENTION (true/false), DEPLOYMENT_FRAMEWORK (tgis_native, hf_accelerate,etc), NUM_GPU, and so on. Is there a place where to the list of supported parameters and their usage is documented?
Describe your proposed solution
it would be nice to have a way to list all the supported parameters with a description of their accepted values and usage.
Maybe a readme file in the repo could address it
Is your feature request related to a problem? If so, please describe.
There might be different parameters a user can tune to deploy their model on the runtime, e.g., TRANSFORMERS_CACHE, FLASH_ATTENTION (true/false), DEPLOYMENT_FRAMEWORK (tgis_native, hf_accelerate,etc), NUM_GPU, and so on. Is there a place where to the list of supported parameters and their usage is documented?
Describe your proposed solution
it would be nice to have a way to list all the supported parameters with a description of their accepted values and usage. Maybe a readme file in the repo could address it
Describe alternatives you have considered
Additional context