Open bwbw723 opened 6 days ago
The rationale behind this field is explained here: https://github.com/deepset-ai/haystack-core-integrations/blob/67e08d0b7e5a7f51f52bb0d40fe40b0ff2caf43a/integrations/weaviate/src/haystack_integrations/document_stores/weaviate/document_store.py#L276-L278
This is done to provide a robust default to users who don't need serious customization.
For simplicity, you can add include this field to your collection configuration: https://github.com/deepset-ai/haystack-core-integrations/blob/67e08d0b7e5a7f51f52bb0d40fe40b0ff2caf43a/integrations/weaviate/src/haystack_integrations/document_stores/weaviate/document_store.py#L40
Does this create problems?
I am using the WeaviateEmbeddingRetriever to work with the data. It works fine with the default class in weaviate. Once I change it to the data class created by myself with customized schema, I got the issue as below:
I check the codes and find that the predefined function need to get data of _original_id and set it as the Document ID. I have updated the codes in document_store.py and set set document_data["id"] as generated UUID if the dataset does not have one. In this case, the expected results are shown. I do not think that the data in weaviate is forced to have the column as _original_id . But based on the current codes, it will return errors if no _original_id there. I prefer to have a if statement to handle the different cases. Please kindly correct me if any misunderstandings.
The packages I am using are: haystack-ai = "2.6.1" fastembed-haystack = "1.3.0" weaviate-client = "^4.9.0" weaviate-haystack = "^4.0.0"