Closed rg2609 closed 7 months ago
Sure, we can create a helm chart for it and keep it up to date in the repos moving forward.
Also note that you can use the neo4j community image and just add the dozerdb-plugin to it as described on the dozerdb.org page.
I will update this ticket once everything is ready.
Thank you for your update. Could you please provide an estimate for the delivery time of the helm chart? This is the only issue left for us to decide to move forward. Your assistance is greatly appreciated. Looking forward to your response.
Hi @jmsuhy,
In addition to your suggestion above, we tried several other scenarios to see if we could get the Helm chart working. While we were able to bring up the Dozerdb server with the Helm chart, in all three scenarios outlined below, we failed to achieve clustering.
The Helm chart provided by Neo4j for their Enterprise edition works well. We were able to successfully establish clustering. See the link for reference: Neo4j Kubernetes Quickstart
Scenario 1: Running the Helm Chart Using Neo4j Community Image with Dozerdb Plugin Image
We deployed the Neo4j Community and added the graphstack/dozerdb:5.16.0.0-alpha.1
image to the Helm chart as a “CustomImage”. We received the error that clustering will only work in the Neo4j Enterprise version.
Scenario 2: Using StatefulSet Replicas for Clustering
We followed the suggestion mentioned in the link below: Neo4j Helm Charts Issue #207
In this scenario, we added the graphstack/dozerdb:5.16.0.0-alpha.1
image to the Helm chart as a “CustomImage” and set the following variables:
edition: "enterprise"
acceptLicenseAgreement: "yes"
minimumClusterSize: 3
We created a database named “try1” and created three replicas of the StatefulSet as “neo4j-helm1-0”, “neo4j-helm1-1”, and “neo4j-helm1-2”.
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm1-0 -- bash
neo4j@neo4j-helm1-0:~$ cd data/databases/
neo4j@neo4j-helm1-0:~/data/databases$ ls
neo4j store_lock system try1
neo4j@neo4j-helm1-0:~/data/databases$ exit
exit
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm1-1 -- bash
neo4j@neo4j-helm1-1:~$ cd data/databases/
neo4j@neo4j-helm1-1:~/data/databases$ ls
neo4j store_lock system
neo4j@neo4j-helm1-1:~/data/databases$
exit
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm1-2 -- bash
neo4j@neo4j-helm1-2:~$ cd data/databases/
neo4j@neo4j-helm1-2:~/data/databases$ ls
neo4j store_lock system
neo4j@neo4j-helm1-2:~/data/databases$
exit
With the above configuration, we created three replicas, but they were unable to communicate with each other. The log results below demonstrate that data created in one replica server is not being reflected in the other two replica servers.
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm-0 -- bash
neo4j@neo4j-helm-0:~$ cypher-shell -u neo4j -p xxxx
Connected to Neo4j using Bolt protocol version 5.4 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> MATCH (n) RETURN count(n);
+----------+
| count(n) |
+----------+
| 171 |
+----------+
1 row
ready to start consuming query after 16 ms, results consumed after another 1 ms
neo4j@neo4j>
Bye!
neo4j@neo4j-helm-0:~$
exit
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm-1 -- bash
neo4j@neo4j-helm-1:~$ cypher-shell -u neo4j -p xxxx
Connected to Neo4j using Bolt protocol version 5.4 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> MATCH (n) RETURN count(n);
+----------+
| count(n) |
+----------+
| 0 |
+----------+
1 row
ready to start consuming query after 34 ms, results consumed after another 0 ms
neo4j@neo4j>
Bye!
neo4j@neo4j-helm-1:~$
exit
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm-2 -- bash
neo4j@neo4j-helm-2:~$ cypher-shell -u neo4j -p xxxx
Connected to Neo4j using Bolt protocol version 5.4 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> MATCH (n) RETURN count(n);
+----------+
| count(n) |
+----------+
| 0 |
+----------+
1 row
ready to start consuming query after 25 ms, results consumed after another 1 ms
neo4j@neo4j>
Bye!
neo4j@neo4j-helm-2:~$
exit
abcd@abcd:~/helm-charts$
Scenario 3: Installation of 3 Helm Charts with Dozerdb-Plugin Image
We deployed 3 Helm charts following the instructions in the Neo4j documentation. In this scenario, we did not use "Neo4j Enterprise or Community" images. Instead, we only added the graphstack/dozerdb:5.16.0.0-alpha.1 image to the Helm chart as a "CustomImage" and set the following variables:
edition: "enterprise"
acceptLicenseAgreement: "yes"
minimumClusterSize: 3
We encountered the same result as in scenario 2 above.
We experimented with other approaches, such as enabling certain configurations in neo4j-enterprise.conf related to clustering, but it did not yield any results.
We are seeking guidance and are willing to assist in creating a Helm chart for the graphstack/dozerdb:5.16.0.0-alpha.1
repository. Thanks in advance for your help.
DozerDB doesn't include built-in clustering and is not on our roadmap.
Here is some reasoning behind it not being on our roadmap, but if enough people really wanted it then we could reconsider.
We do not have built in clustering for high availability (HA) on our roadmap because a majority of HA needs can be met by using modern cloud platforms and open source tools, similar to what Neo4j Enterprise "high availability" clustering offers. Neo4j's clustering mechanism is focused on high availability, not sharding out data like elasticsearch or tigergraph.
For handling large-scale streaming data requiring consistency across clusters, Kafka can be used, though it may not match the speed of Neo4j Enterprise's AKKA-based system for consistency. We leverage kafka because we do not use a lot of Neo4j's non-core features such as search or vector indexes for example. Our ingestion pipelines usually need to sync data in parallel across the tools we leverage (elasticsearch or opensearch, lancedb, etc).
If you have ever tried to ingest huge amounts of complex data into Neo4j (community or enterprise) anyway other than the admin tool - you probably have seen why clustering does not really overcome Neo4j's well known performance issues relating to ingest.
I hope that helps.
Closing this ticket and creating a separate ticket in the new helm charts repo we are setting up.
Thank You so much! We anxiously await Dozerdb auto-database clustering via helm chart as Kafka is not a practical solution for us. Could you please provide us with the link to the ticket? I was unable to find it. Please let me know how I can help with this issue.
Have you already created a Helm chart for
graphstack/dozerdb:5.16.0.0-alpha.1
? We need the Helm chart in YML format to be uploaded to our local repository for security reasons.Also, we have doubts that running the command "helm install my-neo4j-admin neo4j-helm-charts/neo4j-admin --version 5.16.0" will work correctly. This is because the reference within the YML file might have been made to Neo4j image repositories, and not to Dozerdb image repositories. Therefore, we request the Helm chart for
graphstack/dozerdb:5.16.0.0-alpha.1
.