cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.13k stars 3.81k forks source link

roachtest: cdc/kafka-auth failed #118525

Closed cockroach-teamcity closed 9 months ago

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ ed3a25e3c9459cede2f80babbfc9d44a836b6c12:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2293).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-35771

wenyihu6 commented 9 months ago

The kafka log file contains a bunch of failure messages like below:

[2024-01-31 07:59:47,041] WARN [RequestSendThread controllerId=1001] Controller 1001's connection to broker teamcity-13762710-1706682693-17-n1cpu4-0001.c.cockroach-ephemeral.internal:9094 (id: 1001 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Caused by: javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching teamcity-13762710-1706682693-17-n1cpu4-0001.c.cockroach-ephemeral.internal found.
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:360)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:303)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:298)
    at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1357)
    at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232)
    at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
wenyihu6 commented 9 months ago

I was able to reproduce this on master pretty consistently. Likely due to https://github.com/cockroachdb/cockroach/pull/117544.

wenyihu6 commented 9 months ago

Removing release blocker since it seems to be a test issue. It works on cockroach binary but not on roachtests.

demo@127.0.0.1:26257/demoapp/movr> CREATE TABLE auth_test_table(t1 INT);                                                                                                                                                                                            
CREATE TABLE

Time: 5ms total (execution 5ms / network 0ms)

demo@127.0.0.1:26257/demoapp/movr> CREATE CHANGEFEED FOR TABLE auth_test_table INTO                                                                                                                                              
                                -> "kafka://wenyitest.servicebus.windows.net:9093?tls_enabled=true&sasl_enabled=true&sasl_user=$ConnectionString&sasl_password=<redacted>&sasl_mechanism=PLAIN" WITH updated, format=json;                           
        job_id
----------------------
  939284689234853889
(1 row)

NOTICE: changefeed will emit to topic auth_test_table
Time: 396ms total (execution 396ms / network 0ms)
wenyihu6 commented 9 months ago

Likely the same issue as https://cockroachlabs.slack.com/archives/C065X5307U3/p1702915552046409 but it is now surfacing up after the upgrade.

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ cc4fdffa8532d16544c48ef036689763f737dc6b:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2293).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ fce4d4723519bc4ca6e9ef5da0ae19960c84752c:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 15961a19faca0e2b66df2d01a547549523ca70c7:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 3c41c509a87cba7a1fd3f5cfdb0f6badb78e3704:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ d272e9ef5589deff570efc023db6c70edfde311c:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 715628abd134abfd2c0d966f9b7220a6715cc299:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ d7d442e4a3c9dca7e01c4c6f4f00e2f28faa4374:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 7042601857042a057b1d4676735576cfbd37f36a:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 353fded9fe270b3eee4c85480ac1b9ec819f23b0:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ b2e31876366324c2ebe5c2ad8bbd644997e90864:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ b2e31876366324c2ebe5c2ad8bbd644997e90864:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ b2e31876366324c2ebe5c2ad8bbd644997e90864:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 814a375d4c0e79d875c42452725f05f6c27294e3:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 254dbd247fb8ed352a11439063b29f23a0767f28:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ cc6ca026319024800395293b0fb18f05dd8eb50e:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

wenyihu6 commented 9 months ago

Summary: Test history

  1. Kafka-auth was working as expected. In this test, we generate and pass self-signed test certificates for inter-broker communication within the Kafka cluster.
  2. Some changes were made in the java environment or kafka cluster (https://kafka.apache.org/20/documentation.html#security_confighostname), resulting in hostname verification which wasn't previously enforced. This means that the certificate we generated before is no longer valid and missing the DNSNames field. Since then, we’ve always been getting an error message in our kafka server logs. But this error was never raised in sarama code during Dial() AND kafka-auth only checks the success of the CREATE stmt but not emitting messages. So our test has always been passing.
  3. Sarama upgrade changed how Dial() works and is now invoking some untouched kafka code and surfacing the error.
cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 7d0697b632066ee78735fc57e8150222d5576d0d:

(cdc.go:1081).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ 0b7ae19e2b94b851ed8812914f57032aab699811:

(cdc.go:1081).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ e39dafe6d8c153301ff43ed2b3ed3e13af9ec72a:

(cdc.go:1081).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 9 months ago

roachtest.cdc/kafka-auth failed with artifacts on master @ e39dafe6d8c153301ff43ed2b3ed3e13af9ec72a:

(cdc.go:1081).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!