Zipstack / unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
https://unstract.com
GNU Affero General Public License v3.0
2.23k stars 127 forks source link

Indexing/Summarization fails with Pinecone inconsistently #358

Closed gaya3-zipstack closed 4 months ago

gaya3-zipstack commented 4 months ago

What

1) Seeing an inconsistent issue where Pinecone database does not seem to reflect the documents inserted after indexing. In such cases, when summarization follows and tries to retrieve the node(s) that just got indexed as part of indexing, the retrieval returns an empty list. This shows up as an error "Couldn't fetch context from vector db" especially in cases when chunk_size=0. This is not an easily reproducible issue.

2) Prompt edit is not displaying the right prompt results after running the edited prompt

Why

1) Suspecting delay on db writes on the Pinecone side. 2) The updated answer after prompt edit is not getting stored in the prompt_outputs table and hence it always shows the old answer

How

1) Added a small sleep to try retrieval one more time again.

Note: This will not fix the issue. Since this issue is inconsistent and not reproducible easily, this is just a safety net.

2) Fixed the update query to store the updated results

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

1) This should only be invoked when the just indexed documents do not show up on retrieval. Not expecting others flows to break.

2) Only affects prompt outputs flow

Database Migrations

Env Config

Relevant Docs

-

Related Issues or PRs

1) https://zipstack.atlassian.net/browse/UN-1288 2) https://zipstack.atlassian.net/browse/UN-1294

Dependencies Versions

-

Notes on Testing

1) Very difficult to reproduce. Manually intervened and tested. Documented in JIRA UN-1288

2) Add a prompt. Makse sure to run and see the answer. Edit the prompt question and re-run prompt and make sure to see a new relevant answer. Checked prompt_output_manager table to see if the new answer is updated.

Screenshots

Prompt 1

image

DB values

image

Edit and run a new prompt

image

DB values

image

Checklist

I have read and understood the [Contribution Guidelines]().

sonarcloud[bot] commented 4 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud