containers / ramalama

The goal of RamaLama is to make working with AI boring.
MIT License
280 stars 48 forks source link

Fall back to huggingface-cli when pulling via URL fails #475

Closed rhatdan closed 4 days ago

rhatdan commented 5 days ago

Handle non GGUF files as well.

Summary by Sourcery

Implement a fallback mechanism to use 'huggingface-cli' for model pulling when URL-based pulling fails, and fix the argument path for the 'vllm serve' command. Update system tests to reflect these changes.

Bug Fixes:

Enhancements:

Tests:

sourcery-ai[bot] commented 5 days ago

Reviewer's Guide by Sourcery

This PR implements a fallback mechanism for pulling models from HuggingFace. When pulling via direct URL fails, it attempts to use the huggingface-cli. The changes also include fixes for VLLM model serving and error handling improvements.

Sequence diagram for model pulling with fallback

sequenceDiagram
    participant User
    participant System
    participant HuggingFace
    User->>System: Request to pull model
    System->>HuggingFace: Attempt URL pull
    alt URL pull fails
        System->>System: Log error
        System->>HuggingFace: Attempt CLI pull
        alt CLI not available
            System->>User: Raise NotImplementedError
        else CLI available
            System->>HuggingFace: Execute CLI pull
        end
    else URL pull succeeds
        System->>User: Return model path
    end

Updated class diagram for HuggingFace model handling

classDiagram
    class HuggingFaceModel {
        +login(args)
        +logout(args)
        +pull(args)
        +hf_pull(args, model_path, directory_path)
        +url_pull(args, model_path, directory_path)
    }
    note for HuggingFaceModel "Added hf_pull and url_pull methods for fallback mechanism"
    class Model {
        +remove(args)
        +_image(args)
        +serve(args)
    }
    note for Model "Modified serve method for VLLM model serving"

File-Level Changes

Change Details Files
Implement fallback mechanism for HuggingFace model downloads
  • Split model pulling into two methods: url_pull and hf_pull
  • Added try-catch block to attempt URL download first, then fall back to huggingface-cli
  • Changed error handling to raise NotImplementedError when huggingface-cli is not available
ramalama/huggingface.py
Update VLLM model serving configuration
  • Modified VLLM serve command to use model directory path instead of specific model file
ramalama/model.py
test/system/040-serve.bats
Improve error handling in model removal
  • Added missing return statement after handling OSError in ignore mode
ramalama/model.py

Tips and commands #### Interacting with Sourcery - **Trigger a new review:** Comment `@sourcery-ai review` on the pull request. - **Continue discussions:** Reply directly to Sourcery's review comments. - **Generate a GitHub issue from a review comment:** Ask Sourcery to create an issue from a review comment by replying to it. - **Generate a pull request title:** Write `@sourcery-ai` anywhere in the pull request title to generate a title at any time. - **Generate a pull request summary:** Write `@sourcery-ai summary` anywhere in the pull request body to generate a PR summary at any time. You can also use this command to specify where the summary should be inserted. #### Customizing Your Experience Access your [dashboard](https://app.sourcery.ai) to: - Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others. - Change the review language. - Add, remove or edit custom review instructions. - Adjust other review settings. #### Getting Help - [Contact our support team](mailto:support@sourcery.ai) for questions or feedback. - Visit our [documentation](https://docs.sourcery.ai) for detailed guides and information. - Keep in touch with the Sourcery team by following us on [X/Twitter](https://x.com/SourceryAI), [LinkedIn](https://www.linkedin.com/company/sourcery-ai/) or [GitHub](https://github.com/sourcery-ai).
rhatdan commented 5 days ago

@bentito PTAL

bentito commented 4 days ago

@bentito PTAL

When I run latest, I am seeing:

$ bin/ramalama --debug pull huggingface://ibm-granite/granite-3.0-8b-instruct
URL pull failed and huggingface-cli not available
Error: "Failed to pull model: 'failed to pull https://huggingface.co/ibm-granite/raw/main/granite-3.0-8b-instruct: HTTP Error 401: Unauthorized'"
ramalama fork/rhatdan/huggingface $ huggingface-cli version
usage: huggingface-cli <command> [<args>]
huggingface-cli: error: argument {env,login,whoami,logout,repo,upload,download,lfs-enable-largefiles,lfs-multipart-upload,scan-cache,delete-cache}: invalid choice: 'version' (choose from 'env', 'login', 'whoami', 'logout', 'repo', 'upload', 'download', 'lfs-enable-largefiles', 'lfs-multipart-upload', 'scan-cache', 'delete-cache')

Did the usage of hf-cli change from before, because with my suggest mods it was working. See my line comment.

rhatdan commented 4 days ago

@bentito PTANL

bentito commented 4 days ago

/lgtm

Works for me now!