facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.59k stars 3.57k forks source link

Access violation while training an OPQ...,IVF..._HNSW...,PQ... index #1967

Closed gropoli closed 2 years ago

gropoli commented 3 years ago

Summary

Hi,

Following the Guidelines to choose an index I am trying this index: OPQ8_8,IVF65536_HNSW32,PQ8x4fsr

But while in the training phase I am running into an Access violation reading location 0x00000000000002D0 which very looks like a memory issue or a bug somewhere.

The call stack of my main thread:

ANNIndicesTests.exe!mkl_lapack_sgeqrf()    Unknown
ANNIndicesTests.exe!mkl_lapack_thread_team_ctxt_get_task() Unknown
ANNIndicesTests.exe!mkl_lapack_sgeqrf()    Unknown
libiomp5md.dll!__kmp_invoke_microtask()    Unknown
libiomp5md.dll!__kmp_invoke_task_func(int gtid) Line 7515   C++
libiomp5md.dll!__kmp_fork_call(ident * loc, int gtid, fork_context_e call_context, int argc, void(*)(int *, int *) microtask, int(*)(int) invoker, char * ap) Line 2424 C++
libiomp5md.dll!__kmpc_fork_call(ident * loc, int argc, void(*)(int *, int *) microtask) Line 373    C++
ANNIndicesTests.exe!mkl_lapack_sgeqrf()    Unknown
ANNIndicesTests.exe!sgeqrf()   Unknown
ANNIndicesTests.exe!faiss::matrix_qr(int m, int n, float * a) Line 224  C++
ANNIndicesTests.exe!faiss::OPQMatrix::train(__int64 n, const float * x) Line 1054   C++
ANNIndicesTests.exe!faiss::IndexPreTransform::train(__int64 n, const float * x) Line 90 C++

and of the faulting thread:

ANNIndicesTests.exe!mkl_lapack_sgeqr2rft_team()    Unknown
ANNIndicesTests.exe!mkl_lapack_slaqrf_team()   Unknown
ANNIndicesTests.exe!mkl_lapack_sgeqrf()    Unknown
libiomp5md.dll!__kmp_invoke_microtask()    Unknown
libiomp5md.dll!__kmp_invoke_task_func(int gtid) Line 7515   C++
libiomp5md.dll!__kmp_launch_thread(kmp_info * this_thr) Line 6110   C++
libiomp5md.dll!__kmp_launch_worker(void * arg) Line 1072    C++
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

I am training the index with a set of 1,768,364 vectors of dimension 256 (20% of MS MARCO set for passage ranking). MS MARCO makes a 8.8 GB index when all its vectors are added to a Flat index, and I have 288 GB RAM installed, so I am definitely not running out of memory.

I wouldn't assume MKL LAPACK code is buggy but who knows... or maybe there is something I am doing with Faiss at a higher level that is wrong...?

When verbose is on, all I get before it crashes is:

IndexPreTransform::train: training chain 0 to 1
Training chain component 0/1
Input training set too big (max size is 65536), sampling 65536 / 2652546 vectors
OPQMatrix::train: training an OPQ rotation matrix for M=8 from 65536 vectors in 256D -> 8D
OPQMatrix::train: making random 256*256 rotation

What other information can I produce to help you help me? :)

Thanks!

Platform

OS: Windows

Faiss version: 1.7.0

Installed from: compiled by myself

Running on:

Interface:

mdouze commented 3 years ago

Could you try to repro in Python? this irons out much of client-side errors that may occur. OPQ8_8 does not make much sense because it means there is a single dimension per input dimension. Better use OPQ8_32 or OPQ8_64

gropoli commented 3 years ago

Could you try to repro in Python? this irons out much of client-side errors that may occur. OPQ8_8 does not make much sense because it means there is a single dimension per input dimension. Better use OPQ8_32 or OPQ8_64

Thanks @mdouze I'll try doing that, I'm just having a hard time right now trying to build the faiss package for python from the same source version I built the c++ lib I'll let you know when I could repro You're right for OPQ8_8, for now I was just naively exploring the different parameters from a benchmark script I wrote (in Python calling my c++ test program 🙃)