apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.23k stars 3.58k forks source link

[Bug] Issue with implementing TLSAuthentication using Java #17987

Open BajrangDubeyPersistent opened 2 years ago

BajrangDubeyPersistent commented 2 years ago

Search before asking

Version

Apache Pulsar 2.8.2

PythonTLSConsumer.txt PythonTLSProducer.txt JavaTLSConsumer.txt JavaTLSProducer.txt pom.txt

Minimal reproduce step

Followed steps mentioned in : https://pulsar.apache.org/docs/security-tls-transport/ Few Changes not mentioned in documentation:

  1. proxy roles: proxy
  2. remove tls ciphers
  3. certificates to be created wrt role token
  4. change port while creating broker

For the same implementation, I am able to run code in python but not in java

What did you expect to see?

I expected to see Java code running as it was using same configurations as used in Python.

What did you see instead?

Error received:

org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: Connection already closed at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:1027) at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:91) at Producer.main(Producer.java:43) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: Connection already closed at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:89) ... 7 more Caused by: org.apache.pulsar.client.api.PulsarClientException: Connection already closed at org.apache.pulsar.client.impl.ClientCnx.channelInactive(ClientCnx.java:252) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:389) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:389) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354) at org.apache.pulsar.shade.io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1073) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:831) at org.apache.pulsar.shade.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at org.apache.pulsar.shade.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 more

Anything else?

No response

Are you willing to submit a PR?

michaeljmarshall commented 2 years ago

Posting these for future reference:

import java.nio.charset.StandardCharsets;
import java.util.Objects;
import org.apache.pulsar.client.api.*;
import org.apache.pulsar.client.impl.auth.AuthenticationTls;

public class Producer {
    public static final String PULSAR_SERVICE_URL = "pulsar+ssl://localhost:6651/";
    public static final String TOPIC_NAME = "persistent://public/default/mytopic2";

    public static PulsarClient getPulsarClient() throws PulsarClientException {
        return PulsarClient.builder().serviceUrl(PULSAR_SERVICE_URL).build();
    }

    public static void main(String[] args) throws PulsarClientException {

        /*PulsarClient client = PulsarClient.builder()
                .serviceUrl(PULSAR_SERVICE_URL)
                .tlsTrustCertsFilePath("D:\\my-ca\\certs\\ca.cert.pem")
                .authentication("org.apache.pulsar.client.impl.auth.AuthenticationTls",
                        "tlsCertFile:D:\\my-ca\\client.cert.pem, tlsKeyFile:D:\\my-ca\\client.key-pk8.pem")
                .build();*/

        AuthenticationFactory auth = new AuthenticationFactory();
        PulsarClient client = PulsarClient.builder()
                .tlsTrustCertsFilePath("/pulsar/conf/my-ca/certs/ca.cert.pem")
                .authentication(auth.TLS(
                        "/pulsar/conf/my-ca/client.cert.pem",
                        "/pulsar/conf/my-ca/client.key-pk8.pem"))
                        .serviceUrl(PULSAR_SERVICE_URL)
                        .build();

        //PulsarClient pulsarClient = null;
        try {

            client = getPulsarClient();
            org.apache.pulsar.client.api.Producer<byte[]> producer = client
                    .newProducer()
                    .topic("mytopic2")
                    .create();

            for (int i = 0; i < 10; i++) {
                TypedMessageBuilder<byte[]> typedMessageBuilder = producer.newMessage();
                typedMessageBuilder.value(("Hi! There, How are you? Hope you are doing - " + i)
                        .getBytes(StandardCharsets.UTF_8));
                MessageId messageId = typedMessageBuilder.send();
                System.out.println("Message Id " + messageId);
            }
        }

        catch(Exception e)
        {
        e.printStackTrace();
        }
        finally {
            if (Objects.nonNull(client)) {
                client.close();
            }
        }
    }
}
import org.apache.pulsar.client.api.Message;
import org.apache.pulsar.client.api.PulsarClient;
import org.apache.pulsar.client.api.PulsarClientException;

import java.nio.charset.StandardCharsets;
import java.util.Objects;

public class Consumer
{
        public static final String PULSAR_SERVICE_URL = "pulsar+ssl://localhost:6651/";

        public static PulsarClient getPulsarClient() throws PulsarClientException {
            return PulsarClient.builder().serviceUrl(PULSAR_SERVICE_URL).build();
        }

        public static void main(String[] args) throws PulsarClientException {
            PulsarClient pulsarClient = null;

            try {
                pulsarClient = getPulsarClient();
                org.apache.pulsar.client.api.Consumer<byte[]> consumer = pulsarClient.newConsumer()
                        .topic("mytopic2")
                        .subscriptionName("test-sub")
                        .subscribe();

                while(true) {
                    Message<byte[]> message = consumer.receive();
                    String messageValue = new String(message.getData(), StandardCharsets.UTF_8);
                    System.out.println("Message Received:  " + ", Value = " + messageValue);
                    consumer.acknowledge(message.getMessageId());
                }

            } finally {
                if (Objects.nonNull(pulsarClient)) {
                    pulsarClient.close();
                }
            }
        }
    }
michaeljmarshall commented 2 years ago

@BajrangDubeyPersistent - do you have any errors in the broker's log? Those errors are not the root cause. There is some other reason that the connection is getting closed.

BajrangDubeyPersistent commented 2 years ago

@michaeljmarshall - Thanks for the response. Broker logs are as below:

06:33:49.582 [pulsar-io-5-4] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /127.0.0.1:38102 06:33:49.604 [pulsar-io-5-4] WARN org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:38102] Got exception io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:477) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: javax.net.ssl.SSLHandshakeException: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1074) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1365) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1305) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1392) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1435) at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:221) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1341) at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1245) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ... 15 more

06:33:49.605 [pulsar-io-5-4] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /127.0.0.1:38102

michaeljmarshall commented 2 years ago

@BajrangDubeyPersistent - thanks, I see that there is a stack overflow with a very similar stack trace: https://stackoverflow.com/questions/68913530/javax-net-ssl-sslhandshakeexception-error10000438ssl-routinesopenssl-interna. @devinbost submitted the issue and the answer. Does the error persist if you restart and clear the docker cache?

BajrangDubeyPersistent commented 2 years ago

@michaeljmarshall - I am quite sure this is not a VPN issue as same code is running smoothly with Python, but for similar Java code, this TLSV1_ALERT_INTERNAL_ERROR pops up.

I tried following commands to clear docker cache :

alias docker_clean_images='docker rmi $(docker images -a --filter=dangling=true -q)'
alias docker_clean_ps='docker rm $(docker ps --filter=status=exited --filter=status=created -q)'

I also restarted all the containers but the issue still persists. image

At the same time, Python script is not showing any connection issue, it's working well.

image
github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.

michaeljmarshall commented 1 year ago

@BajrangDubeyPersistent - I just took another look, and I see that the Java consumer does not have TLS authentication configured, but the Python consumer does. Could you try updating the consumer's client configuration and then see if that helps?

github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.