datastax / astra-db-ts

Typescript client for Astra DB Vector
https://npmjs.com/@datastax/astra-db-ts
Apache License 2.0
13 stars 6 forks source link

Error: Command "deleteMany" failed with the following error: More records found to be deleted even after deleting 20 records #23

Closed Gr33nLight closed 4 months ago

Gr33nLight commented 6 months ago

Hello, We are using a metadata key to partition vectors in a collection for multi-tenancy. In the delete operation we run the following code:


const astraClient = new AstraDB(
    process.env.ASTRA_DB_APPLICATION_TOKEN,
    process.env.ASTRA_DB_ENDPOINT
  );
  const collection: Collection = await astraClient.collection(collectionName);
  return collection.deleteMany({ tenant: namespace });

Namespace is the string contained in the tenant field. When running, I get the following error thrown by the package:


error: Error: Command "deleteMany" failed with the following error: More records found to be deleted even after deleting 20 records

What do I have to do when there are more than 20 records to delete in a single colllection? Is the approach of separating vectors by metadata key not optimal? If so what should I be doing?

Thanks

toptobes commented 6 months ago

Thanks for bringing this to our attention. What's happened here is the deletion went through just fine, but, as the Data API only deletes up to 20 records at a time, the error was attempting to tell you to rerun the deletion to delete more records that were found. I agree that it's quite unintuitive though, and it'll be made cleaner in the upcoming release

Gr33nLight commented 6 months ago

Thanks for bringing this to our attention. What's happened here is the deletion went through just fine, but, as the Data API only deletes up to 20 records at a time, the error was attempting to tell you to rerun the deletion to delete more records that were found. I agree that it's quite unintuitive though, and it'll be made cleaner in the upcoming release

Ok gotcha, in my use case it could happen (rarely) that I need to delete a high number of records (let's say 1k) what should I do in that case? I store pdf embeddings and if a pdf has more than 500 pages for example, it results in having a lot of records and deleting 20 each time is not ideal... Suggestions?

toptobes commented 6 months ago

Sorry for the late response (long weekend), but, for the time being, there's unfortunately no way around deleting your records 20-a-time

If speed is a must, you could potentially try firing off many deleteManys concurrently, which I did find in testing to be ~33% faster, firing off a request every 50ms until there was no more deleteMany (of course, YMMV by a lot), but there would be quite a few wasted network calls there

I did find that deleting 1k records 20 at a time (albeit with a really simple schema, your filters may or may not have some adverse effect) didn't take too obscenely long, taking ~4.3 seconds for all 1k to be deleted

toptobes commented 4 months ago

deleteMany is now implicitly paginated in the v1.0.0 release.