LlamaEdge / sd-api-server

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge
https://llamaedge.com/
13 stars 2 forks source link

Why is it not working? #2

Open MMMazart opened 1 month ago

MMMazart commented 1 month ago

./wasmedge --dir .:. sd-api-server.wasm --model-name sd-v1.4 --model /mnt/data/zhangmingyang/t2i/models/stable-diffusion-v-1-4-GGUF/stable-diffusion-v1-4-Q8_0.gguf I executed this command, and the result is as follows, but when I send the request 'curl -X POST 'http://localhost:8080/v1/images/generations' --header 'Content-Type: application/json' --data '{"model": "sd-v1.4", "prompt": "A cute baby sea otter"}'', there is no response. What is going on? image

MMMazart commented 1 month ago

现在它可以正常运行,我不知道是为什么。

MMMazart commented 1 month ago

另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?

MMMazart commented 1 month ago

另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?

时间都花费在了加载模型上

apepkuss commented 1 month ago

现在它可以正常运行,我不知道是为什么。

If no new log messages show on the screen, please check if the port you're using is in use or not.

另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?

The model is loaded ONLY at the stage of context initialization, and ONLY loaded once. Could you please share with us the following environment info?

Thanks a lot!

MMMazart commented 1 month ago

现在它可以正常运行,我不知道是为什么。

If no new log messages show on the screen, please check if the port you're using is in use or not.

另外一个问题是,stable-diffusion每次都是迅速响应,flux每次要加载很久,这是因为在加载模型吗?每次post请求,都需要重新加载flux模型吗?

The model is loaded ONLY at the stage of context initialization, and ONLY loaded once. Could you please share with us the following environment info?

  • Operating system
  • CPU
  • Memory
  • GPU and vRAM if present

Thanks a lot!

感谢您的回复!

MMMazart commented 1 month ago

image 他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

MMMazart commented 1 month ago

image 他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

Thank you. This is very important to me.

MMMazart commented 1 month ago

image 他每次会卡在这里重新加载。抱歉我操作失误关闭了这个issue

Thank you. This is very important to me.

Flux1-dev is like this every time as well. 每次运行完,gpu显存就被释放了。

apepkuss commented 1 month ago

Thanks for the feedback. We'll check the issue ASAP. Thanks!

MMMazart commented 1 month ago

Thanks for the feedback. We'll check the issue ASAP. Thanks!

Thanks a lot.

hydai commented 1 month ago

After checking the design of wasmedge-stablediffusion, the context should remain after loading. @apepkuss Will the sd-api-server or llama-core try to init and drop the context per request?

MMMazart commented 1 month ago

The models of the stable-diffusion series will not be dropped, while those of the flux series will be dropped.

apepkuss commented 1 month ago

@hydai According to the investigation, llama-core create the text_to_image or image_to_image context once per request. The improvement in the design will come in the next release. Thanks!

MMMazart commented 1 month ago

@hydai According to the investigation, llama-core create the text_to_image or image_to_image context once per request. The improvement in the design will come in the next release. Thanks!根据调查, llama-core每个请求都会创建一次text_to_imageimage_to_image上下文。设计的改进将在下一个版本中出现。谢谢!

Thank you! I'm looking forward to it very very much!

apepkuss commented 1 month ago

@MMMazart We released v0.1.5. Please try it. Thanks!

MMMazart commented 1 month ago

@MMMazart We released v0.1.5. Please try it. Thanks!

Thank you for your effort, but it seems there are a few bugs at the moment:

  1. It can only read relative paths (./), but cannot read absolute paths.
  2. When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?
  3. The program crashes when sending the second request. image

Additionally, I have a question: How do I load the flux1-merged model? Is it the same as flux1-dev and others?

apepkuss commented 1 month ago

@MMMazart Thanks for your quick feedback!

  1. It can only read relative paths (./), but cannot read absolute paths.

You have to do directory mappings as the running environment is an wasm sandbox. That's why you can see --dir .:. in the command, which is doing mapping between guest dir and host dir. The following is an example, showing that the local directory /Users/sam/workspace/demo/sd/dev to the root directory of the wasm sandbox environment:

wasmedge --dir .:/Users/sam/workspace/demo/sd/dev sd-api-server.wasm \
  --model-name flux1-dev \
  --diffusion-model flux1-dev-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf
  1. When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?

Yeah, the major target of v0.1.5 is to solve the issue of context creation. In the next release, we will add a CLI option to control which context (or both) is created.

  1. The program crashes when sending the second request.

Could you please provide more details about the issue, such as request you used, CPU/GPU, memory/vram, and etc. That would help us reproduce the issue.

In addition, our wasmedge_stablediffusion plugin is based on stable-diffusion.cpp (master-e71ddce). According to our test with flux.1-dev, stable-diffusion.cpp (master-e71ddce) causes sagfault issues with some prompts. In our plan, we will upgrade wasmedge_stablediffusion plugin to stable-diffusion.cpp (master-14206fd), which has some fixes.

How do I load the flux1-merged model? Is it the same as flux1-dev and others?

I have no idea about flux1-merged, so I cannot tell if they are same or not. If it is an open-sourced model, you can share with use the link to the model. We'll check it.

MMMazart commented 1 month ago

@MMMazart Thanks for your quick feedback!

  1. It can only read relative paths (./), but cannot read absolute paths.

You have to do directory mappings as the running environment is an wasm sandbox. That's why you can see --dir .:. in the command, which is doing mapping between guest dir and host dir. The following is an example, showing that the local directory /Users/sam/workspace/demo/sd/dev to the root directory of the wasm sandbox environment:

wasmedge --dir .:/Users/sam/workspace/demo/sd/dev sd-api-server.wasm \
  --model-name flux1-dev \
  --diffusion-model flux1-dev-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf
  1. When initially loading the context, both the text-to-image and image-to-image models are loaded simultaneously, which consumes a large amount of VRAM. In actual use, I only want to use the text-to-image model. Could you add an option to load only one model?

Yeah, the major target of v0.1.5 is to solve the issue of context creation. In the next release, we will add a CLI option to control which context (or both) is created.

  1. The program crashes when sending the second request.

Could you please provide more details about the issue, such as request you used, CPU/GPU, memory/vram, and etc. That would help us reproduce the issue.

In addition, our wasmedge_stablediffusion plugin is based on stable-diffusion.cpp (master-e71ddce). According to our test with flux.1-dev, stable-diffusion.cpp (master-e71ddce) causes sagfault issues with some prompts. In our plan, we will upgrade wasmedge_stablediffusion plugin to stable-diffusion.cpp (master-14206fd), which has some fixes.

How do I load the flux1-merged model? Is it the same as flux1-dev and others?

I have no idea about flux1-merged, so I cannot tell if they are same or not. If it is an open-sourced model, you can share with use the link to the model. We'll check it.

My environment information is the same as mentioned before and has not been changed. This problem occurs every time.

apepkuss commented 1 month ago

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!

apepkuss commented 1 month ago

@MMMazart For the issue 2 mentioned before, please try 0.1.6. This version add --context-type CLI option with text-to-image, image-to-image, and full possible values. The default setting is full, meaning create both text-to-image and image-to-image contexts.

MMMazart commented 1 month ago

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev时会触发该问题,对吧?谢谢!

prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.

MMMazart commented 1 month ago

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev时会触发该问题,对吧?谢谢!

prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.提示:“一只可爱的猫举着一个牌子,上面写着‘flux.cpp’”。是的,flux.1-dev和flux.1-schnell都会触发这个问题。

"a cat" will trigger it, too. This seems to have nothing to do with the prompt.

apepkuss commented 1 month ago

@MMMazart Do you mind sharing with us the prompt you're using? BTW, the issue is triggered while using flux.1-dev, right? Thanks!您介意与我们分享您正在使用的提示吗?顺便说一句,使用flux.1-dev时会触发该问题,对吧?谢谢!

prompt:"a lovely cat holding a sign says 'flux.cpp'". Yes, both flux.1-dev and flux.1-schnell will trigger this problem.

@MMMazart Could you share with us the request? For example, steps.

MMMazart commented 1 month ago

text-to-image

headers = { 'Content-Type': 'application/json' }

data = { "model": "flux1-schnell",

"prompt": "a lovely cat holding a sign says 'flux.cpp'",

"prompt": "a cat",
"cfg_scale": 1.0,
"sample_method": "euler",
"steps": 8,

} time_start = time.time() response = requests.post(url, headers=headers, json=data)

This is my request, which is the same as the example

MMMazart commented 1 month ago

text-to-image

headers = { 'Content-Type': 'application/json' }

data = { "model": "flux1-schnell", # "prompt": "a lovely cat holding a sign says 'flux.cpp'", "prompt": "a cat", "cfg_scale": 1.0, "sample_method": "euler", "steps": 8,

} time_start = time.time() response = requests.post(url, headers=headers, json=data)

This is my request, which is the same as the example

@apepkuss

MMMazart commented 1 month ago

image

After first inference is completed, it can be seen that the memory is released. So, the second request directly results in an error. @apepkuss

apepkuss commented 1 month ago

@MMMazart Which version of CUDA are you using?

MMMazart commented 1 month ago

@MMMazart Which version of CUDA are you using? 11.5 @apepkuss image image

apepkuss commented 1 month ago

@MMMazart We don't have A100, so we tried to reproduce the issue in the environment of 3080 + cuda 11.3 + ubuntu 20.04. The entire process works correctly, no crash. Please refer to the following snapshot. Thanks!

image
MMMazart commented 1 month ago

@MMMazart We don't have A100, so we tried to reproduce the issue in the environment of 3080 + cuda 11.3 + ubuntu 20.04. The entire process works correctly, no crash. Please refer to the following snapshot. Thanks!

image

I see that in your snapshot, it shows that only one request has been sent? It will crash on the second request. Can you send multiple requests? In my environment, after the first request, the context will be deleted. Thanks!

MMMazart commented 1 month ago

我把cuda版本换成了12.2,在第一次request后,context还是会被清除。I changed the CUDA version to 12.2. After the first request, the context will still be cleared. My Ubuntu version is 22.04, but it seems that the biggest difference is on the GPU.

fabiopolimeni commented 1 month ago

I don't think it has anything to do with machines, GPU etc. I am getting the very same behaviour on a Macbook M3 Pro 48GB of shared RAM.

At the second request the server crashes:

segmentation fault  wasmedge --dir .:. sd-api-server.wasm --model-name flux1-schnell   --vae  

I followed the steps for the FLUX example.

Server runs with:

wasmedge --dir .:. sd-api-server.wasm \
  --model-name flux1-schnell \
  --diffusion-model flux1-schnell-Q4_0.gguf \
  --vae ae.safetensors \
  --clip-l clip_l.safetensors \
  --t5xxl t5xxl-Q8_0.gguf \
  --context-type text-to-image

The client request:

curl -X POST 'http://localhost:8080/v1/images/generations' \
  --header 'Content-Type: application/json' \
  --data '{
      "model": "flux1-schnell",
      "prompt": "a lovely cat",
      "cfg_scale": 1.0,
      "sample_method": "euler",
      "steps": 10
  }'

The second time I execute this request the server crashes.

alabulei1 commented 1 month ago

Thanks for reporting, @fabiopolimeni and @MMMazart . Will release a new version to solve this problem. See the upstream issue. https://github.com/WasmEdge/WasmEdge/issues/3803

hydai commented 1 month ago

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

MMMazart commented 1 month ago

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

image I encounter this error during initialization after the update.

MMMazart commented 1 month ago

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

image I encounter this error during initialization after the update.

My CUDA version is 11.5, but it seems to be unsupported. I switched to version 12.2, which works.

hydai commented 1 month ago

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

image I encounter this error during initialization after the update.

It's weird. This error shows the address it tried to bind is in use. And it's not related to the cuda version. Could you check if you run the cuda-11 version when there are no other applications using the same address/port?

MMMazart commented 1 month ago

Hi @fabiopolimeni and @MMMazart We updated the plugin to fix this problem, please update the plugin and try again.

image I encounter this error during initialization after the update.

It's weird. This error shows the address it tried to bind is in use. And it's not related to the cuda version. Could you check if you run the cuda-11 version when there are no other applications using the same address/port?

This is indeed strange, but I was using the same port before and after. It worked after changing the CUDA version.