DozerDB / dozerdb-core

DozerDB Plugin Core Project
GNU General Public License v3.0
17 stars 1 forks source link

Helm Chart Repo for "graphstack/dozerdb:5.16.0.0-alpha.1" #8

Closed rg2609 closed 3 months ago

rg2609 commented 4 months ago

Have you already created a Helm chart for graphstack/dozerdb:5.16.0.0-alpha.1? We need the Helm chart in YML format to be uploaded to our local repository for security reasons.

Also, we have doubts that running the command "helm install my-neo4j-admin neo4j-helm-charts/neo4j-admin --version 5.16.0" will work correctly. This is because the reference within the YML file might have been made to Neo4j image repositories, and not to Dozerdb image repositories. Therefore, we request the Helm chart for graphstack/dozerdb:5.16.0.0-alpha.1.

jmsuhy commented 4 months ago

Sure, we can create a helm chart for it and keep it up to date in the repos moving forward.

Also note that you can use the neo4j community image and just add the dozerdb-plugin to it as described on the dozerdb.org page.

I will update this ticket once everything is ready.

rg2609 commented 4 months ago

Thank you for your update. Could you please provide an estimate for the delivery time of the helm chart? This is the only issue left for us to decide to move forward. Your assistance is greatly appreciated. Looking forward to your response.

rg2609 commented 4 months ago

Hi @jmsuhy,

In addition to your suggestion above, we tried several other scenarios to see if we could get the Helm chart working. While we were able to bring up the Dozerdb server with the Helm chart, in all three scenarios outlined below, we failed to achieve clustering.

Where Clustering works?

The Helm chart provided by Neo4j for their Enterprise edition works well. We were able to successfully establish clustering. See the link for reference: Neo4j Kubernetes Quickstart

Scenarios Where Clustering Did Not Work?

Scenario 1: Running the Helm Chart Using Neo4j Community Image with Dozerdb Plugin Image

We deployed the Neo4j Community and added the graphstack/dozerdb:5.16.0.0-alpha.1 image to the Helm chart as a “CustomImage”. We received the error that clustering will only work in the Neo4j Enterprise version.

Scenario 2: Using StatefulSet Replicas for Clustering

We followed the suggestion mentioned in the link below: Neo4j Helm Charts Issue #207

In this scenario, we added the graphstack/dozerdb:5.16.0.0-alpha.1 image to the Helm chart as a “CustomImage” and set the following variables:

edition: "enterprise" 
acceptLicenseAgreement: "yes" 
minimumClusterSize: 3 

We created a database named “try1” and created three replicas of the StatefulSet as “neo4j-helm1-0”, “neo4j-helm1-1”, and “neo4j-helm1-2”.

abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm1-0 -- bash  
neo4j@neo4j-helm1-0:~$ cd data/databases/ 
neo4j@neo4j-helm1-0:~/data/databases$ ls 
neo4j  store_lock  system  try1 
neo4j@neo4j-helm1-0:~/data/databases$ exit 
exit 
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm1-1 -- bash  
neo4j@neo4j-helm1-1:~$ cd data/databases/ 
neo4j@neo4j-helm1-1:~/data/databases$ ls 
neo4j  store_lock  system 
neo4j@neo4j-helm1-1:~/data/databases$  
exit 
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm1-2 -- bash  
neo4j@neo4j-helm1-2:~$ cd data/databases/ 
neo4j@neo4j-helm1-2:~/data/databases$ ls 
neo4j  store_lock  system 
neo4j@neo4j-helm1-2:~/data/databases$  
exit

With the above configuration, we created three replicas, but they were unable to communicate with each other. The log results below demonstrate that data created in one replica server is not being reflected in the other two replica servers.

abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm-0 -- bash 
neo4j@neo4j-helm-0:~$ cypher-shell -u neo4j -p xxxx 
Connected to Neo4j using Bolt protocol version 5.4 at neo4j://localhost:7687 as user neo4j. 
Type :help for a list of available commands or :exit to exit the shell. 
Note that Cypher queries must end with a semicolon. 
neo4j@neo4j> MATCH (n) RETURN count(n); 
+----------+ 
| count(n) | 
+----------+ 
| 171      | 
+----------+ 

1 row 
ready to start consuming query after 16 ms, results consumed after another 1 ms 
neo4j@neo4j>  

Bye! 
neo4j@neo4j-helm-0:~$  
exit 
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm-1 -- bash 
neo4j@neo4j-helm-1:~$ cypher-shell -u neo4j -p xxxx 
Connected to Neo4j using Bolt protocol version 5.4 at neo4j://localhost:7687 as user neo4j. 
Type :help for a list of available commands or :exit to exit the shell. 
Note that Cypher queries must end with a semicolon. 
neo4j@neo4j> MATCH (n) RETURN count(n); 
+----------+ 
| count(n) | 
+----------+ 
| 0        | 
+----------+ 

1 row 
ready to start consuming query after 34 ms, results consumed after another 0 ms 
neo4j@neo4j>  

Bye! 
neo4j@neo4j-helm-1:~$  
exit 
abcd@abcd:~/helm-charts$ kubectl exec -it neo4j-helm-2 -- bash 
neo4j@neo4j-helm-2:~$ cypher-shell -u neo4j -p xxxx 
Connected to Neo4j using Bolt protocol version 5.4 at neo4j://localhost:7687 as user neo4j. 
Type :help for a list of available commands or :exit to exit the shell. 
Note that Cypher queries must end with a semicolon. 
neo4j@neo4j> MATCH (n) RETURN count(n); 
+----------+ 
| count(n) | 
+----------+ 
| 0        | 
+----------+ 

1 row 
ready to start consuming query after 25 ms, results consumed after another 1 ms 
neo4j@neo4j>  

Bye! 
neo4j@neo4j-helm-2:~$  
exit 
abcd@abcd:~/helm-charts$ 

Scenario 3: Installation of 3 Helm Charts with Dozerdb-Plugin Image

We deployed 3 Helm charts following the instructions in the Neo4j documentation. In this scenario, we did not use "Neo4j Enterprise or Community" images. Instead, we only added the graphstack/dozerdb:5.16.0.0-alpha.1 image to the Helm chart as a "CustomImage" and set the following variables:

edition: "enterprise" 
acceptLicenseAgreement: "yes" 
minimumClusterSize: 3 

We encountered the same result as in scenario 2 above.


We experimented with other approaches, such as enabling certain configurations in neo4j-enterprise.conf related to clustering, but it did not yield any results.

We are seeking guidance and are willing to assist in creating a Helm chart for the graphstack/dozerdb:5.16.0.0-alpha.1 repository. Thanks in advance for your help.

jmsuhy commented 4 months ago

DozerDB doesn't include built-in clustering and is not on our roadmap.

Here is some reasoning behind it not being on our roadmap, but if enough people really wanted it then we could reconsider.

We do not have built in clustering for high availability (HA) on our roadmap because a majority of HA needs can be met by using modern cloud platforms and open source tools, similar to what Neo4j Enterprise "high availability" clustering offers. Neo4j's clustering mechanism is focused on high availability, not sharding out data like elasticsearch or tigergraph.

For handling large-scale streaming data requiring consistency across clusters, Kafka can be used, though it may not match the speed of Neo4j Enterprise's AKKA-based system for consistency. We leverage kafka because we do not use a lot of Neo4j's non-core features such as search or vector indexes for example. Our ingestion pipelines usually need to sync data in parallel across the tools we leverage (elasticsearch or opensearch, lancedb, etc).

If you have ever tried to ingest huge amounts of complex data into Neo4j (community or enterprise) anyway other than the admin tool - you probably have seen why clustering does not really overcome Neo4j's well known performance issues relating to ingest.

I hope that helps.

jmsuhy commented 3 months ago

Closing this ticket and creating a separate ticket in the new helm charts repo we are setting up.

rg2609 commented 2 months ago

Thank You so much! We anxiously await Dozerdb auto-database clustering via helm chart as Kafka is not a practical solution for us. Could you please provide us with the link to the ticket? I was unable to find it. Please let me know how I can help with this issue.