Closed kalamarski-marcin closed 3 years ago
hi @kalamarski-marcin and thank you, for the very detailed report! We work a bit different than the js driver, that's for sure, but it's interesting to see how other fellows are coding ;) However, as far as identifying the server roles we should yield the correct behavior.
I'll look into the host param story, as it looks like you might've found a bug, sorry for that.
One aspect that worries me though is related to this:
An error occurred couple of times, randomly. Somtimes I was able to query succesfully sometimes not. But definitely more often with a failure. Probably just beacuse Kubernetes service internally works as a load balancer for pods.
I don't have a k8s cluster available to test with hence appreciating any feedback we get, but are you suspecting that k8s lb might load balance the traffic, say: between pods hosting neo4j servers of different roles?
From Bolt.Sips's perspective, the role of a connection i.e. :write
, will instruct the driver what server addresses it should use, from the config, and only the servers having the :write
roles will be used (in a round robin way..., if they are more than one) - this is not affected by the host param, unless I missed something. The address of the servers is refreshed after the route TTL, and they're configured dynamically upon the route refresh.
The server address you have in the driver config, will be that of the router, but you already know this.
Hi,
Thx for fast reply :)
I don't have a k8s cluster available to test with hence appreciating any feedback we get, but are you suspecting that k8s lb might load balance the traffic, say: between pods hosting neo4j servers of different roles?
In a short pod = core server. Each pod has more or less the same configuration. What makes the difference is advertised_address
config option set differently for each pod. All of them have the same ports opened: 5000, 6000, 7000, 7473,7474, 7687.
According to the Kubernetes docs (service): Kubernetes gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them.
My service has opened ports 7474, 7473, 7687 and forwards the traffic to the ports 7474, 7473, 7687 respectively. So if all pods have opened i.a. 7867 port, theory that the service lb the traffic is the only one which is reasonable for me. That is why I got an error randomly.
I've made one more test. I did everything locally. I used your docker-compose.yml and started the Neo4j. Then I changed socket.ex
file:
def connect(host, port, opts, timeout) do
IO.puts "#{host}:#{port}"
:gen_tcp.connect(host, port, opts, timeout)
end
And run:
iex(1)> {:ok, neo} = Sips.start_link(url: "bolt+routing://neo4j:test@localhost:7687", pool_size: 1)
{:ok, #PID<0.280.0>}
localhost:7687
iex(2)> localhost:7687
localhost:7688
localhost:7689
localhost:7687
localhost:7688
localhost:7689
localhost:7687
As you can see it will work perfectly, I mean writing operations, because localhost in that case doesn't matter. What is important is used port in the particular connection. That is why :write, :read, :route connections are established to the right core server, always.
But I my case ports are the same and the most important thing is the host
which, I think, must be the address of the pod not the service.
it seems that no one else can reproduce this, can this be closed? I'll close it and we can reopen it in case we get more reports or receive any PR to address it.
Environment:
Kubernetes services
Service address: sx-causal-cluster-neo4j.neo4j.svc.cluster.local
The address is built according to the: https://github.com/neo-technology/neo4j-google-k8s-marketplace/blob/3.5/user-guide/USER-GUIDE.md#service-address
Kubernetes pods
Pods addresses:
sx-causal-cluster-neo4j-core-0.sx-causal-cluster-neo4j.neo4j.svc.cluster.local sx-causal-cluster-neo4j-core-1.sx-causal-cluster-neo4j.neo4j.svc.cluster.local sx-causal-cluster-neo4j-core-2.sx-causal-cluster-neo4j.neo4j.svc.cluster.local
Routing table Bolt.Sips.info()
When I typed
conn = Bolt.Sips.conn(:write)
and tried to query the databaseBolt.Sips.query!(conn, "CREATE (t:Test)")
I got an error:No write operations are allowed directly on this database. Writes must pass through the leader. The role of this server is: FOLLOWER
An error occurred couple of times, randomly. Somtimes I was able to query succesfully sometimes not. But definitely more often with a failure. Probably just beacuse Kubernetes service internally works as a load balancer for pods.
I was curious why.
To test what is going on I deployed the elixir docker image, entered to the pod, cloned Bolt.Sips repository and change some parts of the code.
After a while I discovered an interesting thing.
I inspected
opts
passed to theBolt.Sips.Protocol.connect
function:Please have a look at the key
:host
and its value. It seems it looks OK. Actually, it is. But. The problem is, it is never used anywhere.In
Bolt.Sips.Protocol.connect
function I've putIO.inspect(host)
just beforewith {:ok, sock} <- socket.connect(host, port, socket_opts, timeout)
and run:All connections have been established directly to the Kubernetes service. None of them to the aforementioned
host
.So I did a nasty hack in
Bolt.Sips.Utils.default_config(opts)
function:And I run it again:
So, in the end,
:get_tcp.connect
received proper host each time invoked.In that way I can query the database (write operation) without any problems. Always.
I've tested it locally. Neo4j launched via docker-compose. Here is docker-compose.yml
To make it work I had to find out IP address of the each running container and edit /etc/hosts file. In my case:
Without it host couldn't be properly resolved.
Maybe I misunderstood the driver concept and Bolt.Sips is OK but my Neo4j configuration is bad (Deployed via Google Cloud Marketplace and I didn't change anything in the config).
@florinpatrascu What do you think?
Edited:
I've checked how it works under the hood in nodejs neo4j-driver. There is a function (src/internal/node/node-channel.js):
So I added
console.log(config.address.resolvedHost())
just beforenet.connect
. And here is the result:The first is IP of the Kubernetes service. The second the leader address.
test.js file: