Closed ckurze closed 10 months ago
Hi Christian,
thanks for reporting. I've added a self-contained example program at ^1, but I haven't been able to reproduce the "Collection not found" problem. I tried it with a CrateDB instance already running, and I also tried once more with a recycled one, without any existing tables.
Can I ask you to try again? Maybe the situation was improved in the meanwhile, and the flaw was resolved by some other fix added recently?
On the other hand, maybe my example program is still incomplete, and you would be able to complete it, in order to reproduce the problem?
With kind regards, Andreas.
Indeed, I am also observing problems on the "Overwriting a vector store" section in vector_search.ipynb
^1.
____ notebook: nbregression(vector_search) ____
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
docs_with_score[0]
------------------
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[20], line 1
----> 1 docs_with_score[0]
IndexError: list index out of range
### Overwriting a vector store
If you have an existing collection, you can overwrite it by using `from_documents`,
and setting `pre_delete_collection = True`.
#%%
db = CrateDBVectorSearch.from_documents(
documents=docs,
embedding=embeddings,
collection_name=COLLECTION_NAME,
connection_string=CONNECTION_STRING,
pre_delete_collection=True,
)
#%%
docs_with_score = db.similarity_search_with_score("foo")
#%%
docs_with_score[0]
#%% md
We may have been able to reproduce the flaw on behalf of bringing in corresponding software tests for the accompanying Jupyter Notebooks.
pytest -k "notebook and vector"
When using
pre_delete_collection=True
, there is only an error stating "Collection not found".
Indeed, this is the only occurrance of logger.warning
within pgvector
. In this manner, it feels a bit like a stray log item, but C'est la vie.
$ ag "warning.*collection not found"
libs/langchain/langchain/vectorstores/pgembedding.py
219: self.logger.warning("Collection not found")
libs/langchain/langchain/vectorstores/pgvector.py
189: self.logger.warning("Collection not found")
[...] the actual collection is not deleted / emptied.
Will have to be investigated. Can you check again?
[...] the actual collection is not deleted / emptied.
Will have to be investigated.
By using the standalone example program cratedb-langchain-pre-delete-collection.py, you can exercise that the Result count
output is different when disabling the pre_delete_collection=True
line.
You may need to invoke the program a few times with and without the line to see the difference. I guess this demonstrates it works well?
@andnig just reported GH-11, which may be related to this one?
Hi again. Unless there are any objections, let's consider this fixed?
Problem
When using
pre_delete_collection=True
, there is only an error stating "Collection not found", the actual collection is not deleted / emptied.Details
Example:
vector_search.ipynb