Closed rhatdan closed 4 days ago
This PR implements a fallback mechanism for pulling models from HuggingFace. When pulling via direct URL fails, it attempts to use the huggingface-cli. The changes also include fixes for VLLM model serving and error handling improvements.
sequenceDiagram
participant User
participant System
participant HuggingFace
User->>System: Request to pull model
System->>HuggingFace: Attempt URL pull
alt URL pull fails
System->>System: Log error
System->>HuggingFace: Attempt CLI pull
alt CLI not available
System->>User: Raise NotImplementedError
else CLI available
System->>HuggingFace: Execute CLI pull
end
else URL pull succeeds
System->>User: Return model path
end
classDiagram
class HuggingFaceModel {
+login(args)
+logout(args)
+pull(args)
+hf_pull(args, model_path, directory_path)
+url_pull(args, model_path, directory_path)
}
note for HuggingFaceModel "Added hf_pull and url_pull methods for fallback mechanism"
class Model {
+remove(args)
+_image(args)
+serve(args)
}
note for Model "Modified serve method for VLLM model serving"
Change | Details | Files |
---|---|---|
Implement fallback mechanism for HuggingFace model downloads |
|
ramalama/huggingface.py |
Update VLLM model serving configuration |
|
ramalama/model.py test/system/040-serve.bats |
Improve error handling in model removal |
|
ramalama/model.py |
@bentito PTAL
@bentito PTAL
When I run latest, I am seeing:
$ bin/ramalama --debug pull huggingface://ibm-granite/granite-3.0-8b-instruct
URL pull failed and huggingface-cli not available
Error: "Failed to pull model: 'failed to pull https://huggingface.co/ibm-granite/raw/main/granite-3.0-8b-instruct: HTTP Error 401: Unauthorized'"
ramalama fork/rhatdan/huggingface $ huggingface-cli version
usage: huggingface-cli <command> [<args>]
huggingface-cli: error: argument {env,login,whoami,logout,repo,upload,download,lfs-enable-largefiles,lfs-multipart-upload,scan-cache,delete-cache}: invalid choice: 'version' (choose from 'env', 'login', 'whoami', 'logout', 'repo', 'upload', 'download', 'lfs-enable-largefiles', 'lfs-multipart-upload', 'scan-cache', 'delete-cache')
Did the usage of hf-cli change from before, because with my suggest mods it was working. See my line comment.
@bentito PTANL
/lgtm
Works for me now!
Handle non GGUF files as well.
Summary by Sourcery
Implement a fallback mechanism to use 'huggingface-cli' for model pulling when URL-based pulling fails, and fix the argument path for the 'vllm serve' command. Update system tests to reflect these changes.
Bug Fixes:
Enhancements:
Tests: