InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Apache License 2.0
30 stars 10 forks source link

feat:update model loader #178

Closed qinguoyi closed 1 month ago

qinguoyi commented 1 month ago

What this PR does / why we need it

https://github.com/InftyAI/llmaz/issues/163#issue-2526060924

Which issue(s) this PR fixes

None

Special notes for your reviewer

this pr, it mainly includes things:

  1. add allow_patterns and ignore_patterns, and adapted with filename
  2. clean up python code, put some hard code in constant file
  3. when download models, only use snapshot_download, with allow_patterns we can download one or more files.
  4. there is not consider the chunked model weights and the complete weights, it will be push with another pr

Does this PR introduce a user-facing change?

Support specify or ignore multiple file downloads
kerthcet commented 1 month ago

Will review this ASAP. Thanks for the work.

kerthcet commented 1 month ago

/kind feature

qinguoyi commented 1 month ago

We may also need an e2e test to make sure the pattern works as expected, but it can be a follow up until I publish a official image tag for model-builder.

Tks for your review, Please hold on this. I will retest the complete process, probably after National Day of the People's Republic of China.

kerthcet commented 1 month ago

kind ping @qinguoyi would like to publish a new release, and happy to include this feature? Do you still work on this?

qinguoyi commented 1 month ago

kind ping @qinguoyi would like to publish a new release, and happy to include this feature? Do you still work on this?

yes, i will finish this work util 10.20

qinguoyi commented 1 month ago

hi, i have finish this work. PTAL. @kerthcet

------- What changes: ---------

  1. The allowPatterns and ignorePatterns field are changed to slice. The patterns fields configured in yaml is connected into a string with ',' and put into the environment variable. After the model-loader obtains it, it is separated by ',' to obtain the list.

  2. Add a validation in the webhook . Once the file name field is set, the allowPatterns and ignorePatterns fields must be nil.

  3. If the file name is set, in the model-loader, the allow_patterns will be the filename.

  4. add some ut to test pattern fields.

------- What tests: -----------

llama.cpp infers with huggingface model

  1. specifying filename, this scene is normal.

    image
  2. llama.cpp infer needs specifying the file path not file dir, so test with specifying allowpattern and ignorepattern, we only see what file download, this scene is normal.

    apiVersion: llmaz.io/v1alpha1
    kind: OpenModel
    metadata:
    name: qwen2-0--5b-gguf
    spec:
    familyName: qwen2
    source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      allowPatterns:
        - "*"
      ignorePatterns:
        - "*.gguf"
    image
  3. specifying filename and allowpattern or ignorepattern, there will be wrong, this scene is normal.

    specify-filename-allowpattern
kerthcet commented 1 month ago

I will take a look today.

kerthcet commented 1 month ago

On the way now, sorry for the late response, I was busy with other things yesterday.

kerthcet commented 1 month ago

I just tested with the model yaml:

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0--5b-gguf
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      # filename: qwen2-0_5b-instruct-q5_k_m.gguf
      allowPatterns:
      - qwen2-0_5b-instruct-q5_k_m.gguf

It failed just because we didn't specify the model file name, are you the same. I'm ok with that because we suggest people to use the filename rather than the patterns when only one file is needed.

The error looks like:

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF'
INFO [                    main] HTTP server is listening | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
INFO [                    main] loading model | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
 ERR [              load_model] unable to load model | tid="139916821146176" timestamp=1729659912 model="/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF"
 ERR [                    main] exiting due to model loading error | tid="139916821146176" timestamp=1729659912
qinguoyi commented 1 month ago

I just tested with the model yaml:

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0--5b-gguf
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      # filename: qwen2-0_5b-instruct-q5_k_m.gguf
      allowPatterns:
      - qwen2-0_5b-instruct-q5_k_m.gguf

It failed just because we didn't specify the model file name, are you the same. I'm ok with that because we suggest people to use the filename rather than the patterns when only one file is needed.

The error looks like:

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF'
INFO [                    main] HTTP server is listening | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
INFO [                    main] loading model | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
 ERR [              load_model] unable to load model | tid="139916821146176" timestamp=1729659912 model="/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF"
 ERR [                    main] exiting due to model loading error | tid="139916821146176" timestamp=1729659912

yes, i am same with you.

with the llama.cp to infer, there needs to specify the filename path not the directory path. so, if we specify the dir path,there will be wrong.

another, let we can see, when filename is not set, the model path will be the directory. when i develop, i have already find this , but i have no good idea to fit it.

how do you think about it ?

https://github.com/InftyAI/llmaz/blob/da979a1818e0a9fbe5c0b68865cbdcc78cb93200/pkg/controller_helper/model_source/modelhub.go#L43-L57

kerthcet commented 1 month ago

Let's be simple at first and we can evolve in the future, so the idea is:

Just as we do today.

kerthcet commented 1 month ago

Some kind tips, let's address the comments with new commits, then the reviewers can find the append changes and understand what's going on there, we can squash the commits in the last minute.

qinguoyi commented 1 month ago

Some kind tips, let's address the comments with new commits, then the reviewers can find the append changes and understand what's going on there, we can squash the commits in the last minute.

thanks for your kind tips, i have revert the square commits. PTAL one more.

qinguoyi commented 1 month ago

Let's be simple at first and we can evolve in the future, so the idea is:

  • If people wants to deploy with one file, use the filename and leave the patters empty, we'll combine the repoName + fileName
  • If people wants to use the patters, let's load the whole repo in the runtime

Just as we do today.

So, the current code does not need to be changed.

In addition, we need to give user-friendly tips in the documentation to use patterns fields.

kerthcet commented 1 month ago

Squash as well, I'm LGTM with the other parts.

kerthcet commented 1 month ago

/lgtm /approve

Great work!