Closed kerthcet closed 1 week ago
/kind feature
In another issuse https://github.com/InftyAI/llmaz/pull/175#issuecomment-2372716947, there has a new project which shares model weights across the cluster, may change the code with models.
so, i want to know Is it still necessary to develop this feature? this project to get model with python, but new project get model with go.
Yes, we need this, because Manta may leverage the code as well, we don't want to rewrite the client code with other languages anymore.
What I'm concerned about is how to make this a more general approach, maybe we can add two fields in the ModelHub, the allow_patterns
and the ignore_patterns
, which will be passed to the lib directly. You can refer to the huggingface snapshot_download
func for details. modelScope has the similar parameters as well.
I also have two other suggestions:
WDYT?
I agree with you, i will impl this feature soon.
Yes, we need this, because Manta may leverage the code as well, we don't want to rewrite the client code with other languages anymore.
What I'm concerned about is how to make this a more general approach, maybe we can add two fields in the ModelHub, the
allow_patterns
and theignore_patterns
, which will be passed to the lib directly. You can refer to the huggingfacesnapshot_download
func for details. modelScope has the similar parameters as well.I also have two other suggestions:
- Remove the ThreadPoolExecutor for modelScope, because there's only one thread
- When downloading one file with huggingface lib, let's use hf_hub_download
- When downloading the whole repo with huggingface lib, let's use snapshot_download which will downloads files concurrently and we can remove the ThreadPoolExecutor as well.
WDYT?
when i develop, i find we can download one file use snapshot_download with allow_patterns to download one or more files.
i push a request in there https://github.com/InftyAI/llmaz/pull/178#issue-2553977136 PTAL.
Could we close this issue now? @kerthcet
Absolutely, fixed by https://github.com/InftyAI/llmaz/pull/178 /close
What would you like to be added:
Take Mistral for example, it not only contain the chunked model weights, it also has consolidated model weights, when downloading models from huggingface, we should pay attention to this or we will download two replicas of model weights.
Why is this needed:
Fast model loading.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.