abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.82k stars 933 forks source link

Images can not be decoded when using image_url with webp base64 from LibreChat #1140

Open BriceGUILLAUME opened 8 months ago

BriceGUILLAUME commented 8 months ago

Expected Behavior

I was trying to integrate llama-cpp-python with LibreChat, has both use OpenAI API to communicate. The API implementation authorizes the sending of images with image_url that can be embedded in the request body. I was expecting the image to be analyzed by llava but I have an decoding error instead.

Current Behavior

The request is correctly received by llama-cpp-python but I have the following error in logs:

clip_image_load_from_bytes: failed to decode image bytes
llava_image_embed_make_with_bytes: can't load image from bytes, is it a valid image?Segmentation fault (core dumped)

Environment and Context

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7V13 64-Core Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  1
    Core(s) per socket:  24
    Socket(s):           1
    Stepping:            1
    BogoMIPS:            4890.88
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable 
                         nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalig
                         nsse 3dnowprefetch osvw topoext perfctr_core invpcid_single vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero
                          xsaveerptr rdpru arat umip vaes vpclmulqdq rdpid fsrm
Virtualization features: 
  Hypervisor vendor:     Microsoft
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   768 KiB (24 instances)
  L1i:                   768 KiB (24 instances)
  L2:                    12 MiB (24 instances)
  L3:                    96 MiB (3 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-23
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Mitigation; safe RET, no microcode
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Linux *** 6.2.0-1019-azure #19~22.04.1-Ubuntu SMP Wed Jan 10 22:57:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Python 3.10.12
GNU Make 4.3
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Steps to Reproduce

I run a docker instance of llama-cpp-python behind a reverse proxy (for SSL handling) and I tried to send a completion request from LibreChat. Here is the request intercepted by my nginx reverse proxy:

"POST /v1/chat/completions HTTP/1.1" 502 157 "-" "OpenAI/JS 4.20.1" "-" "{\x0A  \x22model\x22: \x22gpt-4-vision-preview\x22,\x0A  \x22temperature\x22: 1,\x0A  \x22top_p\x22: 1,\x0A  \x22stream\x22: true,\x0A  \x22messages\x22: [\x0A    {\x0A      \x22role\x22: \x22user\x22,\x0A      \x22content\x22: [\x0A        {\x0A          \x22type\x22: \x22text\x22,\x0A          \x22text\x22: \x22Que vois-tu ?\x22\x0A        },\x0A        {\x0A          \x22type\x22: \x22image_url\x22,\x0A          \x22image_url\x22: {\x0A            \x22url\x22: \x22...

As you can see the image_url is send in base 64 but somehow there is an issue when it is being decoded.

I have tried to see in the code where this action is happening but I would be glad if someone could help me find where this is comming from.

Thank you for your support!

abetlen commented 8 months ago

@BriceGUILLAUME I believe the clip model we're using expects png bytes only, is webp supported by the OpenAI api?

BriceGUILLAUME commented 8 months ago

@abetlen, thank you for your answer and for the awesome projet as well.

I have managed to get answers from the Azure OpenAI with webp so I guess that this is supported. However I will make a try with png data to see if the issue happens as well, and I will post the results here maybe tomorrow.

abetlen commented 8 months ago

@BriceGUILLAUME looks like you're right https://platform.openai.com/docs/guides/vision/what-type-of-files-can-i-upload

The fix would be to use something like Pillow to convert the image in the chat handler which is do-able but I'll keep it as an optional dependency for the vision chat handlers.

Baka-14 commented 5 months ago

Hey has anyone being assigned to fix this issue ?