DozerDB / dozerdb-core

DozerDB Plugin Core Project
GNU General Public License v3.0
27 stars 4 forks source link

Need guidance on creating Neo4j database replica/clone #5

Closed rg2609 closed 6 months ago

rg2609 commented 7 months ago

We would like to create a new database in Neo4j, which is a replica of our production database. While creating a new database is straightforward using 'CREATE DATABASE [Name]', we are struggling to create a clone of the production database. How can we do this? Is there a method to create a replica or clone database in a relatively short processing time?

jmsuhy commented 7 months ago

There are several ways to do this. You can export and import your database using apoc for example, or using the neo4j-admin command.

Apoc (online) Just look at apoc documentation.

Neo4j Admin Example (offline) Remember to backup the system database if you are backing up a database you created using CREATE DATABASE ./bin/neo4j-admin database dump --to-path=/PATH/FOR/DUMP --verbose system ./bin/neo4j-admin database dump --to-path=/PATH/FOR/DUMP --verbose DBNAME-HERE ./bin/neo4j-admin database load system --from-path=/PATH/FOR/DUMP/ ./bin/neo4j-admin database load DBNAME-HERE --from-path=/PATH/FOR/DUMP/

You can also take live snapshots when using AWS or Azure.

I hope this helps.

jmsuhy commented 7 months ago

Note - if you added a database using CREATE DATABASE - then the metadata for that is in the system database. You could also just copy over the entire data directory when the server is off if you wanted.

If you want to do load balancing like enterprise's HA clustering, I'll be writing a howto on how you can use standard AWS or AZURE services to handle this.

rg2609 commented 7 months ago

I Want to clone the database without closing or stopping the server, how we can do this?

jmsuhy commented 7 months ago

We haven't added online backup functionality yet, which is one way you can do that with enterprise, so if you want to do it while the server is running, here is one approach using apoc while the server is running.

Get newest 5.x apoc plugin from https://github.com/neo4j/apoc/releases/ and put it into your plugins directory if you haven't already.

Create conf/apoc.conf and add the following lines:

apoc.import.file.enabled=true
apoc.export.file.enabled=true
dbms.security.procedures.unrestricted=apoc.*

For each database you want to backup - you run run the following which exports to your NEO4J_HOME/import directory (unless you configure to allow for external dirs access)

CALL apoc.export.cypher.all("DBNAME-export.cypher", {
  format: "cypher-shell",
  useOptimizations: {type: "UNWIND_BATCH", unwindBatchSize: 20}
})
YIELD file, nodes, relationships, properties, time
RETURN file, nodes, relationships, properties, time;

Now you can use apoc to import if you want or you can use the built in cypher shell. Here is an example of using the cypher shell and importing to a newly created database called TESTDB (leave --database off and it will import into the default neo4j database)

./bin/cypher-shell -a bolt://localhost:7687 --database TESTDB -u <username> -p <password>  < import/DBNAME-export.cypher
rg2609 commented 7 months ago

Thank you, @jmsuhy. Apoc (online) worked for me.

However, Neo4j Admin Example (offline) did not work for me:

Neo4j Admin Example (offline) Remember to backup the system database if you are backing up a database you created using CREATE DATABASE ./bin/neo4j-admin database dump --to-path=/PATH/FOR/DUMP --verbose system ./bin/neo4j-admin database dump --to-path=/PATH/FOR/DUMP --verbose DBNAME-HERE ./bin/neo4j-admin database load system --from-path=/PATH/FOR/DUMP/ ./bin/neo4j-admin database load DBNAME-HERE --from-path=/PATH/FOR/DUMP/

I tried the Database dump command and it gave me the following error:

The database is in use. Stop database 'neo4j' and try again.

Then I ran the following commands and it kicked me out of the container:

root@db4e8158bdf3:/var/lib/neo4j# bin/neo4j-admin server unbind
Database is currently locked. Please shutdown database.
Run with '--verbose' for a more detailed error message.
root@db4e8158bdf3:/var/lib/neo4j# bin/neo4j-admin server stop  
Stopping Neo4j.......

While the Apoc command works, it would be nice to have the "Neo4j Admin Example (offline)" working.

jmsuhy commented 7 months ago

It sounds like your graph is still started - you have to shutdown neo4j before attempting the database dump.

rg2609 commented 7 months ago

"I encountered an issue while trying to stop the database, and the error message displaying Neo.ClientError.Statement.UnsupportedAdministrationCommand left me perplexed. Could you please assist me in resolving this issue? Your expertise would be greatly appreciated."

jmsuhy commented 7 months ago

Stoping and Starting a database has not been added, all databases start when the server is started. I am going to create a ticket to add those commands so we can get them in for the release.

We should have drop, stop, and start available at a minimum.

jmsuhy commented 7 months ago

I created a new feature ticket for adding the ability to start and stop databases for the release.

https://github.com/DozerDB/dozerdb-core/issues/10

jmsuhy commented 6 months ago

I am closing this ticket. You may reopen if needed. The start and stop ticket #10 will remain open until completed.

gumshoes commented 5 months ago

For those working in k8s you can edit the deployment to override the entry point so that Neo4j does not start up and thus you can run the neo4j-admin commands to dump a DB. Dumping with neo4j-admin is much faster and results in 10x smaller files than the apoc approach and there are probably other limitations/nuances. Either way it's great to have options.

image: graphstack/dozerdb:5.19.0.0-alpha.1
command: ["/bin/bash"]
args:
  - -c
  - >-
  time neo4j-admin database dump DBNAME-HERE --to-path=/var/lib/neo4j/import/
  (while true; do sleep 1000; done);