Zipstack / unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
https://unstract.com
GNU Affero General Public License v3.0
354 stars 30 forks source link

fix/Prompt Studio cascade delete #362

Closed athul-rs closed 3 weeks ago

athul-rs commented 1 month ago

What

Why

How

  1. The tool file folder is deleted on destroy case from REST for the tool, this is handled in CustomTool model by overriding perform_destroy.

  2. For Docuemnt delete the related txt files from extract and summarize is deleted.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

-

Related Issues or PRs

-

Dependencies Versions

-

Notes on Testing

Screenshots

Checklist

I have read and understood the [Contribution Guidelines]().

chandrasekharan-zipstack commented 1 month ago

@athul-rs I'm not sure if I understand the scope of this cascade delete task - however there's a pending item of clearing things in the vector DB when an index is invalidated / document is deleted. Do we plan to handle that through a separate PR?

athul-rs commented 1 month ago

@athul-rs I'm not sure if I understand the scope of this cascade delete task - however there's a pending item of clearing things in the vector DB when an index is invalidated / document is deleted. Do we plan to handle that through a separate PR?

No @chandrasekharan-zipstack that is included in the same PR. Adding that now as a separate commit.

tahierhussain commented 1 month ago

@athul-rs You've added logic to delete documents inside Prompt Studio Core, which triggers only when the user deletes the Prompt Studio project itself. However, we need to delete documents when the user removes them from the project. Therefore, a better approach would be to listen for deletion signals within the document manager. When a particular row is deleted from the document manager table, its corresponding documents stored in the file system will also be deleted. This approach should work even if the user deletes the Prompt Studio project itself, as we've already structured the models to handle cascade deletion in Django.

A similar approach can be used for deleting vector db records. We'll need to add the method for deletion logic to the index manager, so that whenever a row is deleted, that method will be triggered.

athul-rs commented 1 month ago

@athul-rs You've added logic to delete documents inside Prompt Studio Core, which triggers only when the user deletes the Prompt Studio project itself. However, we need to delete documents when the user removes them from the project. Therefore, a better approach would be to listen for deletion signals within the document manager. When a particular row is deleted from the document manager table, its corresponding documents stored in the file system will also be deleted. This approach should work even if the user deletes the Prompt Studio project itself, as we've already structured the models to handle cascade deletion in Django.

A similar approach can be used for deleting vector db records. We'll need to add the method for deletion logic to the index manager, so that whenever a row is deleted, that method will be triggered.

Reason to move away from signal like post_delete to a approach like this is to avoid async calls.

  1. In django signals work as async, thereby even when a call fails or there is a breakage the delete will still continue.
  2. For the file/folder path variable to be determined there is a requirement of org_id which is unavailable in signal calls as request won't come in. The calling model instance is only available in this case.
sonarcloud[bot] commented 3 weeks ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud