cleishm / libneo4j-client

neo4j-client -- Neo4j Command Line Interface (CLI)
https://neo4j-client.net
Apache License 2.0
155 stars 38 forks source link

Support for bolt+routing protocol #26

Open AO-StreetArt opened 6 years ago

AO-StreetArt commented 6 years ago

Supporting a Neo4j Causal Cluster (https://neo4j.com/docs/operations-manual/3.3/clustering/causal-clustering/introduction/), support for retrieving routing tables using bolt+routing is needed (https://neo4j.com/docs/developer-manual/3.3/drivers/client-applications/#_routing_drivers_bolt_routing) (https://neo4j.com/docs/developer-manual/3.3/terminology/#term-bolt-routing-protocol).

I was going to implement cluster support as a part of the higher-level driver I built on top of this library (https://github.com/AO-StreetArt/NeoCpp). However, after some research, the architecture of Neo4j clusters makes the routing tables accessible from a Core Server, which should be utilized in order to bring cluster support in line with official drivers.

As far as I can tell, the best way to use them is retrieving those tables using bolt, and then using them with a load balancing strategy (example https://github.com/neo4j/neo4j-python-driver/blob/1.6/neo4j/v1/routing.py).

cleishm commented 6 years ago

Hi @AO-StreetArt,

As you've noted, the bolt+routing in neo4j is based on:

Thus far, neo4j-client has just focussed on implementing the bolt protocol and handing everything needed for establishing a connection to a server, sending it queries and obtaining results, leaving the higher level details, like determining server addresses, to the developer consuming this library.

However, there's no reason support for this "higher-layer" of interaction couldn't be developed in this codebase and made available as an additional API. I just haven't had the time or motivation to do so myself :) Happy to discuss and help guide anyone interested in implementing this.

Cheers, Chris

AO-StreetArt commented 6 years ago

Thanks for the reply @cleishm , I'd be happy to implement this, would love your thoughts.

I think it makes sense to allow the consumer to handle the higher-level details, with one caveat. The actual communication with the core server to retrieve the result table. To me, it makes sense conceptually that libneo4j abstracts away the BOLT protocol, with other implementation details left to the consumer.

With that in mind, basically the changes to libneo4j would entail supporting access to the return results from the query "CALL dbms.cluster.routing.getRoutingTable({context})", as well as figuring out how to access the routing context to pass in (still need to figure out exactly where that comes from).

I don't think that libneo4j should really do much beyond that. What do you think?

cleishm commented 6 years ago

I had a couple of more requests for this lately. Shall we discuss some more and consider some work on it?

AO-StreetArt commented 6 years ago

Sounds great. Are you aiming to enable full routing support within the command line client, or just provide for retrieving the routing table?

We can get on IRC/Slack to discuss at some point if you'd like.

cleishm commented 6 years ago

Hey @AO-StreetArt: Full support, whatever that means ;-) Retrieving the routing table can already be done: it's just a cypher query to a stored proc (which is what makes it Neo4j specific, rather than anything to do with the bolt protocol).

AO-StreetArt commented 6 years ago

Sorry for the delay, I took some time today to think about this and pulled together a preliminary design. A few notes before I dump what I'm thinking:




cleishm commented 6 years ago

Hi @AO-StreetArt,

Thanks for making this initial foray!

I think this sounds like a good initial scope. My only comment is that I'd prefer not to expose it using a name like "connection pool", as that feels like an implementation detail. I'd probably just call it a "driver", from when you can start a session (i.e. obtain a connection) with a server.

We can also avoid the neo4j_release_connection method by overloading neo4j_close so that it returns to a pool rather than actually closing. I use OO constructs in many parts of the codebase for doing this (e.g. neo4j_close_results), and it wouldn't be hard to do the same here.

By doing that, we'd make it such that it wouldn't matter if the connection was obtained via a driver interface, or using neo4j_connect(...) directly - usage after that point would be the same.

Cheers, Chris

AO-StreetArt commented 6 years ago

Sorry for the delay, I was on vacation and have just had the time to start playing with this. I think your suggestions sound good and I'll implement them both.

Thus far, I've got a Causal Cluster up and running, and I'm playing with the code required to pull down and parse the routing table. In the course of pulling this together, I've come across another subtle difference we'll need to account for: routing read vs write transactions.

In the Causal Cluster, only the Core servers can handle write queries. Read-replicas (likely to be many more of these in the cluster) are only capable of executing read queries. This means that, when we look at the routing table, we're not just looking at a list of addresses, but a list of addresses and their corresponding role. For example, here's the output of the getRoutingTable procedure in my Causal Cluster (from the Neo4j Browser):

[{ "addresses": [ "localhost:7687" ], "role": "WRITE" }, { "addresses": [ "localhost:7687" ], "role": "READ" }, { "addresses": [ "localhost:7687", "localhost:7687", "localhost:7687" ], "role": "ROUTE" }]

I think the correct design is to have two methods for retrieving connections, one to retrieve a read-only connection and another to retrieve a write connection. So, we'd replace neo4j_retrieve_connection() with neo4j_retrieve_write_connection() and neo4j_retrieve_read_connection(). What are your thoughts?

cleishm commented 6 years ago

Hey!

Before you get too far ahead, it'd be good to share your code and get some feedback. I'd hate to ask to rework a lot after you'd written a significant amount. Do you have a branch you're working to? Can you do some incremental steps?

As for the specific question, I agree you'll need to differentiate the type of connection you need. However, rather than embed that in the function name, I'd follow the previous statement: make it a "type" argument. In this case, I'd use an int and add a couple of #defines that bitmask together, so you can retrieve any connection with a type that matches that mask.

cleishm commented 6 years ago

Here's the API I'd suggest:

neo4j_driver_t *driver = neo4j_driver("bolt+routing://....", config, flags);
neo4j_connection_t *connection = neo4j_session(driver, NEO4J_WRITE_SESSION);
neo4j_result_stream_t *results =
            neo4j_run(connection, "RETURN 'hello world'", neo4j_null);

(obviously with error handling and cleanup in addition)

AO-StreetArt commented 6 years ago

I have a fork that I'm working on, but not much coding done there yet. I'm still trying to get a feel for the required steps by playing with an actual cluster.

You can find my docker-compose.yml file to start a Causal Cluster here: https://gist.github.com/AO-StreetArt/8dd744b9acdf3a9458006ed50abc899e

You can find a basic script which uses libneo4j-client to view the routing table here: https://gist.github.com/AO-StreetArt/047a2b254e5e1f0f3ca685fe1a03dbfc

I like the API you outlined, and I'd be happy to do some incremental work and let you review.