Closed iori0758 closed 1 year ago
Hey! Thank you for reporting Can you share what client did you use, or if it was using the CLI? The embedding in the query seems to be missing. Did you omit it on purpose or did you add the crash report as it was?
@GuyAv46 thanks for reply,the client is on chatgpt-retrieval-plugin,https://github.com/openai/chatgpt-retrieval-plugin,I just follow the guide and post the query ,redis-stack-server docker containner crashed,detail on https://github.com/openai/chatgpt-retrieval-plugin/issues/265
I think I'm seeing something related as I'm doing vector stuff with openai, which can't be a coincidence.
To reproduce:
search_version:2.6.9
FT.CREATE
embeddings
SCHEMA vector_field VECTOR
FLAT
6
TYPE FLOAT32
DIM 1536
DISTANCE_METRIC COSINE
I insert a few rows with a vector_field
field via C# client StackExchange.Redis;
Then I run
var result = await db.ExecuteAsync("FT.SEARCH", "embeddings", "*=>[KNN 5 @vector_field $BLOB AS score]", "PARAMS", "2", "BLOB", embedding, "SORTBY", "score", "DIALECT", "2");
embedding is a byte[1536] built from float32's that have been taken as byte[4] and then concatenated.
So long as at least one item is the index the server crashes with
17: sendChunk │
│ 18: AREQ_Execute │
│ 19: RSSearchCommand │
│ 20: RedisModuleCommandDispatcher │
│ at /__w/redis-stack/redis-stack/redis/src/module.c:695:5 │
│ 21: call │
│ at /__w/redis-stack/redis-stack/redis/src/server.c:3750:5 │
│ 22: processCommand │
│ at /__w/redis-stack/redis-stack/redis/src/server.c:4297:9 │
│ 23: processCommandAndResetClient │
│ at /__w/redis-stack/redis-stack/redis/src/networking.c:2105:9 │
│ 24: processInputBuffer │
│ at /__w/redis-stack/redis-stack/redis/src/networking.c:2206:17 │
│ 25: callHandler │
│ at /__w/redis-stack/redis-stack/redis/src/connhelpers.h:79:18 │
│ connSocketEventHandler │
│ at /__w/redis-stack/redis-stack/redis/src/connection.c:295:14 │
│ 26: aeProcessEvents │
│ at /__w/redis-stack/redis-stack/redis/src/ae.c:427:17 │
│ 27: aeMain │
│ at /__w/redis-stack/redis-stack/redis/src/ae.c:487:9 │
│ 28: main │
│ at /__w/redis-stack/redis-stack/redis/src/server.c:6474:5 │
│ 29: __libc_start_main │
│ 30: _start
Possibly helpful:
------ DUMPING CODE AROUND EIP ------ │
│ Symbol: _Z35FP32_InnerProductSIMD16Ext_SSE_implPKvS0_m (base: 0x7f03cc23ec50) │
│ Module: /opt/redis-stack/lib/redisearch.so (base 0x7f03cc0aa000) │
│ $ xxd -r -p /tmp/dump.hex /tmp/dump.bin │
│ $ objdump --adjust-vma=0x7f03cc23ec50 -D -b binary -m i386:x86-64 /tmp/dump.bin │
│ ------ │
│ 8:M 19 May 2023 19:20:14.583 # dump of function (hexdump of 141 bytes): │
│ f30f1efa488d04974839c77373c5f057c90f1f8000000000c5f8101fc5e059064883c7404883c640c5f81067d0c5f8106fe0c5f81056f0c5f858c1c5d8594ed0c5f858c1c5d0594ee0c5f858c1c5f8104ff0c5f059cac5f058c84839f877b9c5 │
│
@rba100 thanks for your information,i am new for redis search,Did u find a way to solve it
No, this information is for the maintainer. Hopefully useful to them in some way when they look at this issue.
Thanks for the input @rba100 !
Can you please confirm if the embedded blob is present in the crashing command section (the argv
dump)?
Also, are you both using docker images? and if so what is your host machine type? it might be related to running an intel/AMD build on an ARM machine
@GuyAv46 I'm running redis/redis-stack-server:latest
on AMD64. VM inception perhaps, but it's not ARM64. The host is an intel desktop machine running Proxmox, running VMs which are K8s nodes, which finally run the container on containerd.
I just tested ARM64 M1 Mac running Docker with exact same code and RediSearch does not crash. This bug only occurs when running on my AMDx64 setup and it reproducible 100%.
Unfortunately, I don't know what an argv
dump is. I exec'd into the container and redis is not logging to anything other than standard out I think.
Other than the details above, the only clue is the message 'illegal instruction' at the end of the log which might suggest some low level stuff the VM doesn't like.
Happy to try things if it will help.
Oh... I also tried using a very small vector of DIM 2, manually entering random bytes into redis-cli and this did not crash the AMD64 setup. So, bug might relate to the size of the vectors. I note that the random bytes I entered resulted in a distance score of inf
so maybe the dodgy float values are detected and take a different code path. Could be a red herring.
we have hardware optimizations depending on the dimension and the host's hardware. dimensions smaller than 4 will not get any optimization so this code is safe for all architectures. the ...SSE...
suggests that we chose 4-float vectorization.
SSE operations should be available for almost any Intel/AMD 64-bit machine nowadays. I suspect this issue is related to some docker/k8 setup that manages to "fool" our optimization-choosing logic into choosing one with unsupported operations.
Can you please share your setups? @rba100 maybe even CPU model or verify if it should support SSE?
Host:
8 x Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (1 Socket)
Linux 5.15.30-2-pve #1 SMP PVE 5.15.30-3
Distribution: Proxmox 7.2-3
1st level virtual environment: Ubuntu 22.04
VMs running direct on host (not LXC containers)
2nd level virtual environment: containerd-1.5.13-linux-amd64
installed on these VMs
containerd-config.toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
On the both the VMs and also exec'ing into the redis container the output of CPU info is the same (sse and sse2 listed):
$ cat /proc/cpuinfo | grep sse
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology cpuid tsc_known_freq pni cx16 x2apic hypervisor lahf_lm cpuid_fault pti
Just for fun I got an LLM in on the action and it got me to try this, which did not crash on the redis container itself:
#include <stdio.h>
#include <xmmintrin.h> // for SSE intrinsics
int main() {
__m128 a = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f);
__m128 b = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f);
__m128 result;
// Inline assembly to execute the 'addps' SSE instruction
__asm__(
"addps %[a], %[b]\n\t"
"movaps %[b], %[result]"
: [result] "=x"(result)
: [a] "x"(a), [b] "0"(b)
);
float output[4];
_mm_storeu_ps(output, result);
printf("Result: %f, %f, %f, %f\n", output[0], output[1], output[2], output[3]);
return 0;
}
gcc -o sse-addps sse-addps.c -msse
./sse-addps
Result: 5.000000, 5.000000, 5.000000, 5.000000
@GuyAv46 hmm... looking at https://github.com/RedisAI/VectorSimilarity I see you list other instruction types as well. I am assuming this library is the one you use in RediSearch.
I notice AVX is listed as an instruction type that's used.
on both the container and the VM /proc/cpuinfo
does not list avx stuff (and indeed blows up with a similar test program), however it is listed and works on the bare metal machine.
Is it possible this is the cause? I.e. the code checks for SSE and assumes the other families are present?
Apologies if I'm well off the mark here.
Solved. Changed VM CPU type to 'host' and AVX etc is now available and RediSearch no longer crashes.
I guess this isn't your bug if its reasonable to assume modern instructions are available. I hope this isn't too common. The original issue reporter might have a different scenario.
Awesome! The function that caused your crash can be viewed here, and @iori0758 function is located here, and it should be safe even if you don't have any AVX instructions available. with your high dimension that is divisible by 16, if we chose SSE and not AVX it means something was not available for the VM. I would still expect that SSE commands will not crash, but maybe it happened because of permission of the VM regardless. @rba100 can you share how you set the CPU or link to some instructions on how to do so, for future reference?
The default value for CPU emulation is kvm64
On the web UI for Proxmox you can change this with:
via command line it would be qemu-system-x86_64 [...] -cpu host [...]
or something like that.
I'm sure there may be security issues relating to Spectre and whatnot by using host
, however as a development lab this is fine for me. Production users may wish to consider going to the trouble of working out the minimum they need to emulate, e.g. -cpu SandyBridge
or whatever they can get away with.
redis : redis-stack-version:lates ,it run on docker
below is the crash report
``` -- | -- Thu, May 18 2023 9:12:58 pm | 9:C 18 May 2023 13:12:58.787 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo Thu, May 18 2023 9:12:58 pm | 9:C 18 May 2023 13:12:58.787 # Redis version=6.2.12, bits=64, commit=00000000, modified=0, pid=9, just started Thu, May 18 2023 9:12:58 pm | 9:C 18 May 2023 13:12:58.787 # Configuration loaded Thu, May 18 2023 9:12:58 pm | 9:M 18 May 2023 13:12:58.788 * monotonic clock: POSIX clock_gettime Thu, May 18 2023 9:12:58 pm | 9:M 18 May 2023 13:12:58.788 * Running mode=standalone, port=6379. Thu, May 18 2023 9:12:58 pm | 9:M 18 May 2023 13:12:58.788 # Server initialized Thu, May 18 2023 9:12:58 pm | 9:M 18 May 2023 13:12:58.790 *