BriceGUILLAUME commented 8 months ago

Expected Behavior

I was trying to integrate llama-cpp-python with LibreChat, has both use OpenAI API to communicate. The API implementation authorizes the sending of images with image_url that can be embedded in the request body. I was expecting the image to be analyzed by llava but I have an decoding error instead.

Current Behavior

The request is correctly received by llama-cpp-python but I have the following error in logs:

clip_image_load_from_bytes: failed to decode image bytes
llava_image_embed_make_with_bytes: can't load image from bytes, is it a valid image?Segmentation fault (core dumped)

Environment and Context

Hardware

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7V13 64-Core Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  1
    Core(s) per socket:  24
    Socket(s):           1
    Stepping:            1
    BogoMIPS:            4890.88
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable 
                         nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalig
                         nsse 3dnowprefetch osvw topoext perfctr_core invpcid_single vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero
                          xsaveerptr rdpru arat umip vaes vpclmulqdq rdpid fsrm
Virtualization features: 
  Hypervisor vendor:     Microsoft
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   768 KiB (24 instances)
  L1i:                   768 KiB (24 instances)
  L2:                    12 MiB (24 instances)
  L3:                    96 MiB (3 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-23
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Mitigation; safe RET, no microcode
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Operating System

Linux *** 6.2.0-1019-azure #19~22.04.1-Ubuntu SMP Wed Jan 10 22:57:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

SDK version (docker - cuda version compiled on the 2024/01/28:

Python 3.10.12
GNU Make 4.3
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Steps to Reproduce

I run a docker instance of llama-cpp-python behind a reverse proxy (for SSL handling) and I tried to send a completion request from LibreChat. Here is the request intercepted by my nginx reverse proxy:

"POST /v1/chat/completions HTTP/1.1" 502 157 "-" "OpenAI/JS 4.20.1" "-" "{\x0A  \x22model\x22: \x22gpt-4-vision-preview\x22,\x0A  \x22temperature\x22: 1,\x0A  \x22top_p\x22: 1,\x0A  \x22stream\x22: true,\x0A  \x22messages\x22: [\x0A    {\x0A      \x22role\x22: \x22user\x22,\x0A      \x22content\x22: [\x0A        {\x0A          \x22type\x22: \x22text\x22,\x0A          \x22text\x22: \x22Que vois-tu ?\x22\x0A        },\x0A        {\x0A          \x22type\x22: \x22image_url\x22,\x0A          \x22image_url\x22: {\x0A            \x22url\x22: \x22data:image/webp;base64,UklGRpKLAABXRUJQVlA4IIaLAABQTgKdASq8AjwDPm00lkikIqKqItIaeUANiWdu3NFmr2+8qtTVSszHwIS+fn3O2WGxmO6TE8Lw+biWIPMrzbffzudZint883zSeN8gTm9eAZ6/vXIR8Xb1E8bN7pn/O9gX7CewZ/dP9V1wnoiftV6b3tQfuF+2GYB/P/Kf9B/oP9Z4c/l/1P+W/vP+e/4X+K+iH9p/0fJV9N4F/Yj9z/k/3a/xX0J/r/+r/ovHn9d/a/+X/iP83+2vyHfln9C/xv9w/cP+//vJ70O5G3j/Uf9D/H+wd7K/Tv9v/bf85+2vyP/Xf9T/Resf81/n/+V7g/63/73+5eef40f4z/h+wp/RP79/4f8t/qfiM/wf/n/u/za+LH1d/6/9R8H39B/wn/e9db2t/ur7TxB+YMYxnrb3BqtCoqqpmWofkX055xxZM9zhyY+msgAaC5Q+yQwFkXTmRnJ8rnBi1xCJggO0a+vpFfoeGiRVcdTDavfafMG1e+09e5bEUvK2+S08//z2C95lQ8e4+lHfTo1wk1UFdvS+7oODKqqPMG1e+0+YNq99p8wbQtG5cHP7/3HPKV/CjpUeZ2H8K07BlsagelFTMlXDjqYbV77T5g2r32mA/t/cUH2jPd7L1lgHf79TR5dAF0a4FvPDADcWryKmVCgXCkp7q2xzNC1W4omhEK3dHqGVvyZ/GfMG1e+0+YNq99p6wV28ySOerKwj1ekwoYrm4AHBudLZ0S3+LDIsCVhEU+HMFq0uAr+fvfD509flRidWEhM6HAf0oLASmhcd2MFmoFUcGxSBCX7IjHX+4LF3SAPoTvRJbZPykdZvaeQvWQr32nzBtXvtPmDatbd/LgoKDWcfcJEtVtU5bAXy4rkpDK2zUp65UMw0rHgs3NImLw1sg5IO9tk6rJbYXUcBp7q9Xx0WZDT5g2r32nzBtXvsWJaEYEWdNg++z18E86yzBibwq/MjJKPOkGsHufhX3h++XAP7NJ0+7RhuxE0n9nPXrrBYd03Gw7GAoCEKO3nYSyWbtSaIjBlTIGWyON6KlusL5Kr7ac99XfI4KMMQ6rMAUwkALJRhDV2Yq/bDaHA3e+nS7Q/cX64JKtGqOrKaoh/YmEWB+0sEgNuGHQby5LzEoMWMLoL3h0uOULfsIXIIFiIoVV0AjLwhAqbcw6LZIA0sXiRxxwVGJG9bm6eTsr5fvzWy62yzqaoiEiRMRN/+Rl8zu...

As you can see the image_url is send in base 64 but somehow there is an issue when it is being decoded.

I have tried to see in the code where this action is happening but I would be glad if someone could help me find where this is comming from.

Thank you for your support!

abetlen commented 8 months ago

@BriceGUILLAUME I believe the clip model we're using expects png bytes only, is webp supported by the OpenAI api?

BriceGUILLAUME commented 8 months ago

@abetlen, thank you for your answer and for the awesome projet as well.

I have managed to get answers from the Azure OpenAI with webp so I guess that this is supported. However I will make a try with png data to see if the issue happens as well, and I will post the results here maybe tomorrow.

abetlen commented 8 months ago

@BriceGUILLAUME looks like you're right https://platform.openai.com/docs/guides/vision/what-type-of-files-can-i-upload

The fix would be to use something like Pillow to convert the image in the chat handler which is do-able but I'll keep it as an optional dependency for the vision chat handlers.

Baka-14 commented 5 months ago

Hey has anyone being assigned to fix this issue ?

abetlen / llama-cpp-python

Images can not be decoded when using image_url with webp base64 from LibreChat #1140

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce