asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Potential memory leaks #54

Open polterguy opened 1 year ago

polterguy commented 1 year ago

First I want to thank you for an incredible library. It seems to be consistently returning results in the milliseconds timeframe, where my previous logic had to apply a table scan, manually apply the dot product in C#, resulting in 3 to 5 orders of magnitudes faster results. Najs work! :)

Then my question which is about memory leaks. I run this in a Kubernetes environment, and I've noticed it keeps on eating memory, especially when you re-index, and/or query. I know C++ quite well, even though I haven't touched it for more than a decade, and I can see the following code. My comments are prefixed with QUESTION for brevity;

std::vector<float> * vectorFromBlobValue(sqlite3_value*value, const char ** pzErrMsg) {
  int n = sqlite3_value_bytes(value);
  const void * b;
  char header;
  char type;

  if(n < (2)) {
    *pzErrMsg = "Vector blob size less than header length";
    return NULL;
  }
  b = sqlite3_value_blob(value);
  memcpy(&header, ((char *) b + 0), sizeof(char));
  memcpy(&type,   ((char *) b + 1), sizeof(char));

  if(header != VECTOR_BLOB_HEADER_BYTE) {
    *pzErrMsg = "Blob not well-formatted vector blob";
    return NULL;
  }
  if(type != VECTOR_BLOB_HEADER_TYPE) {
    *pzErrMsg = "Blob type not right";
    return NULL;
  }
  int numElements = (n - 2)/sizeof(float);
  float * v = (float *) ((char *)b + 2);

  // QUESTION; Creates new std vector as heap memory and returns to method below
  return new std::vector<float>(v, v+numElements);
}

static std::vector<float>* valueAsVector(sqlite3_value*value) {
  // Option 1: If the value is a "vectorf32v0" pointer, create vector from that
  VectorFloat* v = (VectorFloat*) sqlite3_value_pointer(value, VECTOR_FLOAT_POINTER_NAME);

  // QUESTION; Creates new std vector as heap memory and returns to method below
  if (v!=NULL) return new std::vector<float>(v->data, v->data + v->size);
  std::vector<float> * vec;

  // Option 2: value is a blob in vector format
  if(sqlite3_value_type(value) == SQLITE_BLOB) {
    const char * pzErrMsg = 0;
    if((vec = vectorFromBlobValue(value, &pzErrMsg)) != NULL) {
      return vec;
    }
    if((vec = vectorFromRawBlobValue(value, &pzErrMsg)) != NULL) {
      return vec;
    }
  }
  // Option 3: if value is a JSON array coercible to float vector, use that
  //if(sqlite3_value_subtype(value) == JSON_SUBTYPE) {
  if(sqlite3_value_type(value) == SQLITE_TEXT) {
    if((vec = vectorFromTextValue(value)) != NULL) {
      return vec;
    }else {
      return NULL;
    }
  }

  // else, value isn't a vector
  return NULL;
}

static void vector_value_at(sqlite3_context *context, int argc, sqlite3_value **argv) {

  // QUESTION: Invokes above method returning heaped memory
  std::vector<float>*v = valueAsVector(argv[0]);
  if(v == NULL) return;
  int at = sqlite3_value_int(argv[1]);
  try {
    float result = v->at(at);
    sqlite3_result_double(context, result);
  }
   catch (const std::out_of_range& oor) {
    char * errmsg = sqlite3_mprintf("%d out of range: %s", at, oor.what());
    if(errmsg != NULL){
      sqlite3_result_error(context, errmsg, -1);
      sqlite3_free(errmsg);
    }
    else sqlite3_result_error_nomem(context);
  }

  // QUESTION: v pointer is never deleted?
}

This is only one example of where I suspect the library might be losing memory. There are other examples, but I am not acquainted enough with SQLite plugins to be sure, and there might be some hidden internals in STLite freeing up the memory that I can't see. However, I see your library is using massive amounts of memory over time, and it seems to be increasing.

Suggestion

Use auto_ptr to move heaped memory around to avoid lost memory?

polterguy commented 1 year ago

Created PR

romiras commented 2 months ago

Where is mentioned PR? Could you please share?