Open MMMazart opened 1 month ago
现在它可以正常运行,我不知道是为什么。
另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?
另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?
时间都花费在了加载模型上
现在它可以正常运行,我不知道是为什么。
If no new log messages show on the screen, please check if the port you're using is in use or not.
另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?
The model is loaded ONLY at the stage of context initialization, and ONLY loaded once. Could you please share with us the following environment info?
Thanks a lot!
现在它可以正常运行,我不知道是为什么。
If no new log messages show on the screen, please check if the port you're using is in use or not.
另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?
The model is loaded ONLY at the stage of context initialization, and ONLY loaded once. Could you please share with us the following environment info?
- Operating system
- CPU
- Memory
- GPU and vRAM if present
Thanks a lot!
感谢您的回复!
他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue
他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue
Thank you. This is very important to me.
他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue
Thank you. This is very important to me.
Flux1-dev is like this every time as well. 每次运行完,gpu显存就被释放了。
Thanks for the feedback. We'll check the issue ASAP. Thanks!
Thanks for the feedback. We'll check the issue ASAP. Thanks!
Thanks a lot.
After checking the design of wasmedge-stablediffusion, the context should remain after loading. @apepkuss Will the sd-api-server or llama-core try to init and drop the context per request?
The models of the stable-diffusion series will not be dropped, while those of the flux series will be dropped.
@hydai According to the investigation, llama-core
create the text_to_image
or image_to_image
context once per request. The improvement in the design will come in the next release. Thanks!
@hydai According to the investigation,
llama-core
create thetext_to_image
orimage_to_image
context once per request. The improvement in the design will come in the next release. Thanks!根据调查,llama-core
每个请求都会创建一次text_to_image
或image_to_image
上下文。设计的改进将在下一个版本中出现。谢谢!
Thank you! I'm looking forward to it very very much!
@MMMazart We released v0.1.5. Please try it. Thanks!
Thank you for your effort, but it seems there are a few bugs at the moment:
Additionally, I have a question: How do I load the flux1-merged model? Is it the same as flux1-dev and others?
@MMMazart Thanks for your quick feedback!
- It can only read relative paths (./), but cannot read absolute paths.
You have to do directory mappings as the running environment is an wasm sandbox. That's why you can see --dir .:.
in the command, which is doing mapping between guest dir and host dir. The following is an example, showing that the local directory /Users/sam/workspace/demo/sd/dev
to the root directory of the wasm sandbox environment:
wasmedge --dir .:/Users/sam/workspace/demo/sd/dev sd-api-server.wasm \
--model-name flux1-dev \
--diffusion-model flux1-dev-Q4_0.gguf \
--vae ae.safetensors \
--clip-l clip_l.safetensors \
--t5xxl t5xxl-Q8_0.gguf
- When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?
Yeah, the major target of v0.1.5
is to solve the issue of context creation. In the next release, we will add a CLI option to control which context (or both) is created.
- The program crashes when sending the second request.
Could you please provide more details about the issue, such as request you used, CPU/GPU, memory/vram, and etc. That would help us reproduce the issue.
In addition, our wasmedge_stablediffusion
plugin is based on stable-diffusion.cpp (master-e71ddce)
. According to our test with flux.1-dev
, stable-diffusion.cpp (master-e71ddce)
causes sagfault
issues with some prompts. In our plan, we will upgrade wasmedge_stablediffusion
plugin to stable-diffusion.cpp (master-14206fd)
, which has some fixes.
How do I load the flux1-merged model? Is it the same as flux1-dev and others?
I have no idea about flux1-merged
, so I cannot tell if they are same or not. If it is an open-sourced model, you can share with use the link to the model. We'll check it.
@MMMazart Thanks for your quick feedback!
- It can only read relative paths (./), but cannot read absolute paths.
You have to do directory mappings as the running environment is an wasm sandbox. That's why you can see
--dir .:.
in the command, which is doing mapping between guest dir and host dir. The following is an example, showing that the local directory/Users/sam/workspace/demo/sd/dev
to the root directory of the wasm sandbox environment:wasmedge --dir .:/Users/sam/workspace/demo/sd/dev sd-api-server.wasm \ --model-name flux1-dev \ --diffusion-model flux1-dev-Q4_0.gguf \ --vae ae.safetensors \ --clip-l clip_l.safetensors \ --t5xxl t5xxl-Q8_0.gguf
- When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?
Yeah, the major target of
v0.1.5
is to solve the issue of context creation. In the next release, we will add a CLI option to control which context (or both) is created.
- The program crashes when sending the second request.
Could you please provide more details about the issue, such as request you used, CPU/GPU, memory/vram, and etc. That would help us reproduce the issue.
In addition, our
wasmedge_stablediffusion
plugin is based onstable-diffusion.cpp (master-e71ddce)
. According to our test withflux.1-dev
,stable-diffusion.cpp (master-e71ddce)
causessagfault
issues with some prompts. In our plan, we will upgradewasmedge_stablediffusion
plugin tostable-diffusion.cpp (master-14206fd)
, which has some fixes.How do I load the flux1-merged model? Is it the same as flux1-dev and others?
I have no idea about
flux1-merged
, so I cannot tell if they are same or not. If it is an open-sourced model, you can share with use the link to the model. We'll check it.
My environment information is the same as mentioned before and has not been changed. This problem occurs every time.
@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev
, right? Thanks!
@MMMazart For the issue 2 mentioned before, please try 0.1.6. This version add --context-type
CLI option with text-to-image
, image-to-image
, and full
possible values. The default setting is full
, meaning create both text-to-image
and image-to-image
contexts.
@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using
flux.1-dev
, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev
时会触发该问题,对吧?谢谢!
prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.
@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using
flux.1-dev
, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev
时会触发该问题,对吧?谢谢!prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.提示:“一只可爱的猫举着一个牌子,上面写着‘flux.cpp’”。是的,flux.1-dev和flux.1-schnell都会触发这个问题。
"a cat" will trigger it, too. This seems to have nothing to do with the prompt.
@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using
flux.1-dev
, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev
时会触发该问题,对吧?谢谢!prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.
@MMMazart Could you share with us the request? For example, steps.
text-to-image
headers = { 'Content-Type': 'application/json' }
data = { "model": "flux1-schnell",
"prompt": "a cat",
"cfg_scale": 1.0,
"sample_method": "euler",
"steps": 8,
} time_start = time.time() response = requests.post(url, headers=headers, json=data)
This is my request, which is the same as the example
text-to-image
headers = { 'Content-Type': 'application/json' }
data = { "model": "flux1-schnell", # "prompt": "a lovely cat holding a sign says 'flux.cpp'", "prompt": "a cat", "cfg_scale": 1.0, "sample_method": "euler", "steps": 8,
} time_start = time.time() response = requests.post(url, headers=headers, json=data)
This is my request, which is the same as the example
@apepkuss
After first inference is completed, it can be seen that the memory is released. So, the second request directly results in an error. @apepkuss
@MMMazart Which version of CUDA are you using?
@MMMazart Which version of CUDA are you using? 11.5 @apepkuss
@MMMazart We don't have A100, so we tried to reproduce the issue in the environment of 3080
+ cuda 11.3
+ ubuntu 20.04
. The entire process works correctly, no crash. Please refer to the following snapshot. Thanks!
@MMMazart We don't have A100, so we tried to reproduce the issue in the environment of
3080
+cuda 11.3
+ubuntu 20.04
. The entire process works correctly, no crash. Please refer to the following snapshot. Thanks!
I see that in your snapshot, it shows that only one request has been sent? It will crash on the second request. Can you send multiple requests? In my environment, after the first request, the context will be deleted. Thanks!
我把cuda版本换成了12.2,在第一次request后,context还是会被清除。I changed the CUDA version to 12.2. After the first request, the context will still be cleared. My Ubuntu version is 22.04, but it seems that the biggest difference is on the GPU.
I don't think it has anything to do with machines, GPU etc. I am getting the very same behaviour on a Macbook M3 Pro 48GB of shared RAM.
At the second request the server crashes:
segmentation fault wasmedge --dir .:. sd-api-server.wasm --model-name flux1-schnell --vae
I followed the steps for the FLUX example.
Server runs with:
wasmedge --dir .:. sd-api-server.wasm \
--model-name flux1-schnell \
--diffusion-model flux1-schnell-Q4_0.gguf \
--vae ae.safetensors \
--clip-l clip_l.safetensors \
--t5xxl t5xxl-Q8_0.gguf \
--context-type text-to-image
The client request:
curl -X POST 'http://localhost:8080/v1/images/generations' \
--header 'Content-Type: application/json' \
--data '{
"model": "flux1-schnell",
"prompt": "a lovely cat",
"cfg_scale": 1.0,
"sample_method": "euler",
"steps": 10
}'
The second time I execute this request the server crashes.
Thanks for reporting, @fabiopolimeni and @MMMazart . Will release a new version to solve this problem. See the upstream issue. https://github.com/WasmEdge/WasmEdge/issues/3803
Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.
Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.
I encounter this error during initialization after the update.
Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.
I encounter this error during initialization after the update.
My CUDA version is 11.5, but it seems to be unsupported. I switched to version 12.2, which works.
Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.
I encounter this error during initialization after the update.
It's weird. This error shows the address it tried to bind is in use. And it's not related to the cuda version. Could you check if you run the cuda-11 version when there are no other applications using the same address/port?
Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.
I encounter this error during initialization after the update.
It's weird. This error shows the address it tried to bind is in use. And it's not related to the cuda version. Could you check if you run the cuda-11 version when there are no other applications using the same address/port?
This is indeed strange, but I was using the same port before and after. It worked after changing the CUDA version.
./wasmedge --dir .:. sd-api-server.wasm --model-name sd-v1.4 --model /mnt/data/zhangmingyang/t2i/models/stable-diffusion-v-1-4-GGUF/stable-diffusion-v1-4-Q8_0.gguf I executed this command, and the result is as follows, but when I send the request 'curl -X POST 'http://localhost:8080/v1/images/generations' --header 'Content-Type: application/json' --data '{"model": "sd-v1.4", "prompt": "A cute baby sea otter"}'', there is no response. What is going on?