JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.32k stars 1.17k forks source link

SSL integration tests are failing for ScyllaDB #3595

Open porunov opened 1 year ago

porunov commented 1 year ago

Stack Trace (if you have one)

Caused by: org.testcontainers.containers.ContainerLaunchException: Timed out waiting for container port to open (localhost ports: [32769] should be listening)
        at org.testcontainers.containers.wait.strategy.HostPortWaitStrategy.waitUntilReady(HostPortWaitStrategy.java:102)
        at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:52)
        at org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:953)
        at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:485)
        ... 60 more

I think there is either problem with port mapping when certificates are used or something else. Looks like it's possible to start docker container via:

docker run --rm --name test-scylla --volume cert/node.crt:/etc/ssl/node.crt --volume cert/node.key:/etc/ssl/node.key --volume cert/node.keystore:/etc/ssl/node.keystore --volume cqlshrc:/root/.cassandra/cqlshrc --volume scylla-murmur-ssl.yaml:/etc/scylla/scylla.yaml -it -p 9042:9042 scylladb/scylla --developer-mode=1 --memory 2G --smp 1 --skip-wait-for-gossip-to-settle 0

That said, testcontainers can't connect to the mapped port or something else happens which prevent testcontainers to ping the port.

FlorianHockmann commented 3 months ago

I've debugged this and found out that the problem seems to be that Testcontainers wants to check whether the mapped port is open to verify that the container (Scylla) started successfully and it is using netcat for this, but netcat is not installed in the Scylla container which lets this wait check fail.

Here is the relevant debug log output from testcontainers:

14:14:15 DEBUG org.testcontainers.containers.wait.strategy.HostPortWaitStrategy.lambda$waitUntilReady$3 - External port check passed for [9042] mapped as [60753] in PT0.020367S
[...]
14:15:15 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.Wire.wire - http-outgoing-2 << "[0x2][0x0][0x0][0x0][0x0][0x0][0x0][0x1a]/bin/sh: 1: nc: not found[\n]"
14:15:15 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.Wire.wire - http-outgoing-2 << "[\r][\n]"
14:15:15 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.Wire.wire - http-outgoing-2 << "66[\r][\n]"
14:15:15 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.Wire.wire - http-outgoing-2 << "[0x2][0x0][0x0][0x0][0x0][0x0][0x0]^/bin/bash: connect: Connection refused[\n]"
14:15:15 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.Wire.wire - http-outgoing-2 << "/bin/bash: /dev/tcp/localhost/9042: Connection refused[\n]"
// a lot more of these netcat logs

14:15:15 DEBUG org.testcontainers.containers.GenericContainer.tryStart - Wait strategy threw an exception
org.testcontainers.containers.ContainerLaunchException: Timed out waiting for container port to open (localhost ports: [60753] should be listening)

However, it is still failing when I use an image where I installed netcat:

15:49:47 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.Wire.wire - http-outgoing-2 << "[0x2][0x0][0x0][0x0][0x0][0x0][0x0][0xffffff88]nc: connect to localhost port 9042 (tcp) failed: Connection refused[\n]"
15:49:47 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.Wire.wire - http-outgoing-2 << "nc: connect to localhost port 9042 (tcp) failed: Connection refused[\n]"
15:49:47 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.Wire.wire - http-outgoing-2 << "[\r][\n]"

So, the reason simply changed from nc: not found to nc: connect to localhost port 9042 (tcp) failed: Connection refused.

I am not sure yet why that's the case and I especially don't know why this is only a problem for the SSL tests.

mykaul commented 2 months ago

Can you look at Scylla logs? I bet Scylla is not loading properly, due to certificates (mis)configuration.

FlorianHockmann commented 2 months ago

@porunov added some logs here which also includes the Scylla logs. @roydahan already commented there that it looks like Scylla started successfully.

To me, it looks more like a problem of Testcontainers not being able to execute its connectivity checks for Scylla successfully which it does before the tests are executed. I just don't understand why this is specific to the SSL tests as a simple netcat check should work irrespective of whether SSL is configured or not.

mykaul commented 2 months ago

@porunov added some logs here which also includes the Scylla logs. @roydahan already commented there that it looks like Scylla started successfully.

I don't see a log line for listening to the CQL port.

FlorianHockmann commented 2 months ago

You're right. Such a log message should be there, right? But it includes at least this line:

cql_server_controller - Enabling encrypted CQL connections between client and server

Does that mean that Scylla is still trying to start its CQL service, but maybe fails silently?

I would also have expected the log message Scylla version X.Y.Z initialization completed. to indicate that Scylla started successfully, but that is also not present here.

porunov commented 2 months ago

@porunov added some logs here which also includes the Scylla logs. @roydahan already commented there that it looks like Scylla started successfully.

To me, it looks more like a problem of Testcontainers not being able to execute its connectivity checks for Scylla successfully which it does before the tests are executed. I just don't understand why this is specific to the SSL tests as a simple netcat check should work irrespective of whether SSL is configured or not.

I was trying to figure out this a while ago, but stuck because I used same certificates which were working perfectly fine in Cassandra but failed to work normally in ScyllaDB. I guess it could also be the issue with certificates. Maybe we have wrong certificates, but Cassandra is more relaxed to wrong certificates (just guessing) or maybe we need to use slightly different configuration for ScyllaDB.