deepset-ai / hayhooks

Deploy Haystack pipelines behind a REST Api.
https://haystack.deepset.ai
Apache License 2.0
39 stars 11 forks source link

database collection crud endpoints #16

Closed Rusteam closed 4 months ago

Rusteam commented 4 months ago

We're using hayhooks to serve haystack pipelines and we need a way to find out which collections are already present in the vector db (qdrant). We've decided to build a custom component and deploy it, however I wonder if it's better to create these methods for fastapi by default?

masci commented 4 months ago

That's a good one - I would have done the same, custom component wrapped in a pipeline that's just there for the purpose of invoking that component.

But the line between a clever use of hayooks' design and a hack is blurry here, I wonder if we should support explicitly the customization of the internal FastAPI router...

Rusteam commented 4 months ago

the component turned out to be quite simple:


@component
class QdrantCollectionManagement(DocumentWriter):
    def __init__(
        self,
        document_store: QdrantDocumentStoreExt,
        policy: DuplicatePolicy = DuplicatePolicy.NONE,
    ):
        self.document_store = document_store
        self.client = document_store.client
        self.index = document_store.index
        self.policy = policy

    @component.output_types(collections=list[str])
    def run(self, action: str):
        action = QdrantCollectionAction(action)
        if action == QdrantCollectionAction.delete:
            self.client.delete_collection(self.index)
        collections = list(
            map(lambda x: x.name, self.client.get_collections().collections)
        )
        return {"collections": collections}

alternatively, I could commit it to qdrant integrations for re-usage

masci commented 4 months ago

That's great @Rusteam! Please tag me if you contribute the component to the integration, I can review!