First I want to thank you for an incredible library. It seems to be consistently returning results in the milliseconds timeframe, where my previous logic had to apply a table scan, manually apply the dot product in C#, resulting in 3 to 5 orders of magnitudes faster results. Najs work! :)
Then my question which is about memory leaks. I run this in a Kubernetes environment, and I've noticed it keeps on eating memory, especially when you re-index, and/or query. I know C++ quite well, even though I haven't touched it for more than a decade, and I can see the following code. My comments are prefixed with QUESTION for brevity;
std::vector<float> * vectorFromBlobValue(sqlite3_value*value, const char ** pzErrMsg) {
int n = sqlite3_value_bytes(value);
const void * b;
char header;
char type;
if(n < (2)) {
*pzErrMsg = "Vector blob size less than header length";
return NULL;
}
b = sqlite3_value_blob(value);
memcpy(&header, ((char *) b + 0), sizeof(char));
memcpy(&type, ((char *) b + 1), sizeof(char));
if(header != VECTOR_BLOB_HEADER_BYTE) {
*pzErrMsg = "Blob not well-formatted vector blob";
return NULL;
}
if(type != VECTOR_BLOB_HEADER_TYPE) {
*pzErrMsg = "Blob type not right";
return NULL;
}
int numElements = (n - 2)/sizeof(float);
float * v = (float *) ((char *)b + 2);
// QUESTION; Creates new std vector as heap memory and returns to method below
return new std::vector<float>(v, v+numElements);
}
static std::vector<float>* valueAsVector(sqlite3_value*value) {
// Option 1: If the value is a "vectorf32v0" pointer, create vector from that
VectorFloat* v = (VectorFloat*) sqlite3_value_pointer(value, VECTOR_FLOAT_POINTER_NAME);
// QUESTION; Creates new std vector as heap memory and returns to method below
if (v!=NULL) return new std::vector<float>(v->data, v->data + v->size);
std::vector<float> * vec;
// Option 2: value is a blob in vector format
if(sqlite3_value_type(value) == SQLITE_BLOB) {
const char * pzErrMsg = 0;
if((vec = vectorFromBlobValue(value, &pzErrMsg)) != NULL) {
return vec;
}
if((vec = vectorFromRawBlobValue(value, &pzErrMsg)) != NULL) {
return vec;
}
}
// Option 3: if value is a JSON array coercible to float vector, use that
//if(sqlite3_value_subtype(value) == JSON_SUBTYPE) {
if(sqlite3_value_type(value) == SQLITE_TEXT) {
if((vec = vectorFromTextValue(value)) != NULL) {
return vec;
}else {
return NULL;
}
}
// else, value isn't a vector
return NULL;
}
static void vector_value_at(sqlite3_context *context, int argc, sqlite3_value **argv) {
// QUESTION: Invokes above method returning heaped memory
std::vector<float>*v = valueAsVector(argv[0]);
if(v == NULL) return;
int at = sqlite3_value_int(argv[1]);
try {
float result = v->at(at);
sqlite3_result_double(context, result);
}
catch (const std::out_of_range& oor) {
char * errmsg = sqlite3_mprintf("%d out of range: %s", at, oor.what());
if(errmsg != NULL){
sqlite3_result_error(context, errmsg, -1);
sqlite3_free(errmsg);
}
else sqlite3_result_error_nomem(context);
}
// QUESTION: v pointer is never deleted?
}
This is only one example of where I suspect the library might be losing memory. There are other examples, but I am not acquainted enough with SQLite plugins to be sure, and there might be some hidden internals in STLite freeing up the memory that I can't see. However, I see your library is using massive amounts of memory over time, and it seems to be increasing.
Suggestion
Use auto_ptr to move heaped memory around to avoid lost memory?
First I want to thank you for an incredible library. It seems to be consistently returning results in the milliseconds timeframe, where my previous logic had to apply a table scan, manually apply the dot product in C#, resulting in 3 to 5 orders of magnitudes faster results. Najs work! :)
Then my question which is about memory leaks. I run this in a Kubernetes environment, and I've noticed it keeps on eating memory, especially when you re-index, and/or query. I know C++ quite well, even though I haven't touched it for more than a decade, and I can see the following code. My comments are prefixed with QUESTION for brevity;
This is only one example of where I suspect the library might be losing memory. There are other examples, but I am not acquainted enough with SQLite plugins to be sure, and there might be some hidden internals in STLite freeing up the memory that I can't see. However, I see your library is using massive amounts of memory over time, and it seems to be increasing.
Suggestion
Use auto_ptr to move heaped memory around to avoid lost memory?