Closed qinguoyi closed 1 month ago
Will review this ASAP. Thanks for the work.
/kind feature
We may also need an e2e test to make sure the pattern works as expected, but it can be a follow up until I publish a official image tag for model-builder.
Tks for your review, Please hold on this. I will retest the complete process, probably after National Day of the People's Republic of China.
kind ping @qinguoyi would like to publish a new release, and happy to include this feature? Do you still work on this?
kind ping @qinguoyi would like to publish a new release, and happy to include this feature? Do you still work on this?
yes, i will finish this work util 10.20
hi, i have finish this work. PTAL. @kerthcet
------- What changes: ---------
The allowPatterns and ignorePatterns field are changed to slice. The patterns fields configured in yaml is connected into a string with ',' and put into the environment variable. After the model-loader obtains it, it is separated by ',' to obtain the list.
Add a validation in the webhook . Once the file name field is set, the allowPatterns and ignorePatterns fields must be nil.
If the file name is set, in the model-loader, the allow_patterns will be the filename.
add some ut to test pattern fields.
------- What tests: -----------
llama.cpp infers with huggingface model
specifying filename, this scene is normal.
llama.cpp infer needs specifying the file path not file dir, so test with specifying allowpattern and ignorepattern, we only see what file download, this scene is normal.
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
name: qwen2-0--5b-gguf
spec:
familyName: qwen2
source:
modelHub:
modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
allowPatterns:
- "*"
ignorePatterns:
- "*.gguf"
specifying filename and allowpattern or ignorepattern, there will be wrong, this scene is normal.
I will take a look today.
On the way now, sorry for the late response, I was busy with other things yesterday.
I just tested with the model yaml:
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
name: qwen2-0--5b-gguf
spec:
familyName: qwen2
source:
modelHub:
modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
# filename: qwen2-0_5b-instruct-q5_k_m.gguf
allowPatterns:
- qwen2-0_5b-instruct-q5_k_m.gguf
It failed just because we didn't specify the model file name, are you the same. I'm ok with that because we suggest people to use the filename rather than the patterns when only one file is needed.
The error looks like:
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF'
INFO [ main] HTTP server is listening | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
INFO [ main] loading model | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0"
ERR [ load_model] unable to load model | tid="139916821146176" timestamp=1729659912 model="/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF"
ERR [ main] exiting due to model loading error | tid="139916821146176" timestamp=1729659912
I just tested with the model yaml:
apiVersion: llmaz.io/v1alpha1 kind: OpenModel metadata: name: qwen2-0--5b-gguf spec: familyName: qwen2 source: modelHub: modelID: Qwen/Qwen2-0.5B-Instruct-GGUF # filename: qwen2-0_5b-instruct-q5_k_m.gguf allowPatterns: - qwen2-0_5b-instruct-q5_k_m.gguf
It failed just because we didn't specify the model file name, are you the same. I'm ok with that because we suggest people to use the filename rather than the patterns when only one file is needed.
The error looks like:
llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model '/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF' INFO [ main] HTTP server is listening | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0" INFO [ main] loading model | tid="139916821146176" timestamp=1729659912 n_threads_http="7" port="8080" hostname="0.0.0.0" ERR [ load_model] unable to load model | tid="139916821146176" timestamp=1729659912 model="/workspace/models/models--Qwen--Qwen2-0.5B-Instruct-GGUF" ERR [ main] exiting due to model loading error | tid="139916821146176" timestamp=1729659912
yes, i am same with you.
with the llama.cp to infer, there needs to specify the filename path not the directory path. so, if we specify the dir path,there will be wrong.
another, let we can see, when filename is not set, the model path will be the directory. when i develop, i have already find this , but i have no good idea to fit it.
how do you think about it ?
Let's be simple at first and we can evolve in the future, so the idea is:
Just as we do today.
Some kind tips, let's address the comments with new commits, then the reviewers can find the append changes and understand what's going on there, we can squash the commits in the last minute.
Some kind tips, let's address the comments with new commits, then the reviewers can find the append changes and understand what's going on there, we can squash the commits in the last minute.
thanks for your kind tips, i have revert the square commits. PTAL one more.
Let's be simple at first and we can evolve in the future, so the idea is:
- If people wants to deploy with one file, use the filename and leave the patters empty, we'll combine the repoName + fileName
- If people wants to use the patters, let's load the whole repo in the runtime
Just as we do today.
So, the current code does not need to be changed.
In addition, we need to give user-friendly tips in the documentation to use patterns
fields.
Squash as well, I'm LGTM with the other parts.
/lgtm /approve
Great work!
What this PR does / why we need it
https://github.com/InftyAI/llmaz/issues/163#issue-2526060924
Which issue(s) this PR fixes
None
Special notes for your reviewer
this pr, it mainly includes things:
Does this PR introduce a user-facing change?