Add Neo4j integration - Githubissues

prosto commented 7 months ago

The integration includes the following components for Haystack 2.0:

Neo4jDocumentStore - implementation of standard Protocol (e.g. write_documents)
Neo4jEmbeddingsRetriever - component to retrieve documents by given query embeddings using respective Document Store
Neo4jDynamicDocumentRetriever - retriever component which can run any Cypher query to retrieve directly from Neo4j without Document Store being involved

Full documentation is here https://prosto.github.io/neo4j-haystack/

annthurium commented 7 months ago

very cool! We'll take a look and review this week 👋🏻

annthurium commented 7 months ago

hi @prosto ! Thanks so much for the detailed instructions on how to run this code - they were very helpful. And in general this extension is very cool and the team is excited about it.

One issue I ran into is that unfortunately Haystack 2.0-beta5 that was just released this week has a few breaking changes. most of which were documented in the release notes. But I've listed them here to make things easier.

Hopefully they should be a quick fix and may not all apply to you. Sorry for the trouble, let us know if you want help or have questions.

from haystack.document_stores import DocumentStoreError now needs to be from haystack.document_stores.errors import DocumentStoreError (I definitely ran into this one)
if you're using DuplicatePolicy anywhere that import is now from haystack.document_stores.types import DuplicatePolicy
Change any occurrence of: from haystack.components.routers.document_joiner import DocumentJoinerto: from haystack.components.joiners.document_joiner import DocumentJoiner
Change the imports for in_memory document store and retrievers from: from haystack.document_stores import InMemoryDocumentStore from haystack.components.retrievers import InMemoryEmbeddingRetriever to: from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
Rename the transcriber parameters model_name and model_name_or_path to model. This change affects both LocalWhisperTranscriber and RemoteWhisperTranscriber classes.
Rename the embedder parameters model_name and model_name_or_path to model. This change affects all Embedder classes.
Rename model_name_or_path to model in NamedEntityExtractor.
Rename model_name_or_path to model in TransformersSimilarityRanker.

Rename parameter model_name_or_path to model inExtractiveReader.

Rename the generator parameters model_name and model_name_or_path to model. This change affects all Generator classes.

prosto commented 7 months ago

hi @annthurium ,

thank you for your feedback!

I addressed the recent haystack 2.0-beta5 updates in the version v2.0.2 of the package, reflected in the changelog, namely:

Renamed the embedder parameters model_name and model_name_or_path to model. Applied in both code and documentation
Changed imports for DocumentStoreError and DuplicatePolicy

This PR has been also updated to reflect the changes (e.g. model_name_or_path -> model)

Looks like I need to periodically check latest updates of 2.0 as it is still in beta. I expected that anyway :)

Thank you for bringing in release notes in here. Please let me know if anything else can be improved.

TuanaCelik commented 7 months ago

small stylistic info for you two: the tiles on the page are dark purple, meaning a black logo might not look really nice, if you have a logo meant for dark backgrounds that might be ideal 👍

annthurium commented 7 months ago

thank you so much!! Will take another look today 👀

prosto commented 7 months ago

hi @TuanaCelik , thats a good point

I will update the logo to look neat on dark background (will take one from the list).

Below is a comparison:

Old (light)	New (dark)

annthurium commented 7 months ago

the Haystack 2.0-beta5 changes were perfect, no issues there. Thanks for taking care of that so quickly!

One problem I'm having: I tried adding some documents to the DocumentStore. I can't get the retrievers to retrieve documents that seem like they should be relevant to the query.

This is probably user error since I've never used neo4j before and am not that savvy on graph databases in general. Let me know if I'm doing something incorrectly here, or if there might be a bug in the implementation. Thanks so much.

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from neo4j_haystack import Neo4jEmbeddingRetriever, Neo4jDocumentStore

model_name = "sentence-transformers/all-MiniLM-L6-v2"

document_store = Neo4jDocumentStore(
    url="bolt://localhost:7687",
    username="neo4j",
    password="passw0rd",
    database="neo4j",
    embedding_dim=384,
    index="document-embeddings",
)

document_store.write_documents([
    Document(content="My name is Tilde and I live in San Francisco.", meta={"release_date": "2018-12-09"})])

print(document_store.count_documents())

pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model=model_name))
pipeline.add_component("retriever", Neo4jEmbeddingRetriever(document_store=document_store))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

result = pipeline.run(
    data={
        "text_embedder": {"text": "What cities do people live in?"},
        "retriever": {
            "top_k": 3,
            "filters": {"field": "release_date", "operator": "==", "value": "2018-12-09"},
        },
    }
)

documents: list[Document] = result["retriever"]["documents"]
print(documents)

1
[]

prosto commented 7 months ago

hi @annthurium , at first glance it looks like document_store.write_documents will write documents without embeddings (its the realm of indexing pipeline with SentenceTransformersDocumentEmbedder + DocumentWriter). write_documents will just write only content and meta in your case.

If above is not the case I will double check whats wrong..

btw if you have docker container running you can open Neo4j Browser at http://localhost:7474 to check what has been written to the db

annthurium commented 6 months ago

hi @prosto - if you don't have any concerns about the suggested updates to the example code here, I'd like to merge this tomorrow. Let me know, and thanks again for building this!

prosto commented 6 months ago

hi @annthurium - I have applied your suggestions and also added pip install sentence-transformers in my recent commit

I also added additional section called "More examples" with references to some from the repo which might be useful as well for those who would like to start using Neo4j with haystack..

Thank you for your suggestions! Please let me know if I can improve anything else.

Have a nice day.

annthurium commented 6 months ago

thank you so much! Excellent work 🎉

TuanaCelik commented 6 months ago

Hey @prosto - thank you so much for your contribution and also, thanks so much for including API docs in your integration. I wanted to ask you whether you'd be willing to tell us more about neo4j, graph databases, vector storage in graph databases and using them for LLM apps etc. We often host office hours in our Discord server in a voice channel, maybe we could dedicate one to this topic? Let me know what you think :) here's the server btw: https://discord.com/invite/VBpFzsgRVF

prosto commented 6 months ago

hi @TuanaCelik, I could tell about neo4j integration in details. This week I am pretty much booked with some personal stuff. Next week I could join the voice channel to participate. Shall I ping you in discord for more details ?

deepset-ai / haystack-integrations

Add Neo4j integration #124