Open damienburke opened 9 months ago
Can this issue be vetted? I am happy to work on this too, if it can be assigned? Thanks
@damienburke Thank you for your proposal. We greatly appreciate your willingness to contribute to this project! I'd encourage you to join the #dev channel on Apache Pulsar's Slack for real-time assistance should you need any help along the way.
Additionally, it may be beneficial to check out PR #13951, which was not completed. The comments on that PR could provide valuable insights into the general requirements for token refresh. It's important to note that the token should be refreshed well before its expiration. This is to account for potential periods of unavailability of the token provider. The code in PR #13951 could also be reused. I was planning to take over the PR from @michaeljmarshall, but I haven't had a chance to do so. If you'd also like to take over the scope of #13951, that would work for me.
Thanks for the feedback @lhotari, and link to related PR.
Reviewing said PR, that functionality sounds very useful. As a first step, my pref is for mTLS to have parity with the existing token refresh logic - i.e. all server side driven - as per authenticationRefreshCheckSeconds
. I have a PoC mostly working and can share my approach in next couple of days. Does it sound reasonable to keep the scope of my issue as is?
Fyi, with our infrastructure (Vault & Consul), we already have the ability to rotate auth credentials before they actually expire (and make them available to the pulsar clients apps). Also, we are polyglot env, and therefore would prob leverage this approach, even if the Java client supported this.
I think there are two things here:
This feature is supported.
The connection didn't disconnect. This is a bug, we need to add a certificate monitor to check if the certificate expires.
@damienburke's PR: https://github.com/apache/pulsar/compare/master...damienburke:pulsar:master
This is a way to check if the certificate expires, which is not optimal, we also have TLS encrypted transport, not auth. I think we should check each TLS connection, once the certificate expires we need to disconnect the client.
More context: https://github.com/apache/pulsar/issues/22124#issuecomment-1991723573
@nodece - just to confirm the scope of what u are thinking. Would this certificate monitor JUST be checking the certificate provided by the client to auth? Or should it also check the expiry of the certificate used to setup the TLS transport, i.e. the tlsCertificateFilePath
cert.
Do u have any further tips / advice re fixing this? Could this be implemented similar to how refreshAuthenticationCredentials
is scheduled in ServerCnx
? If i could get some direction I could help tackle this bug.
Just also to restate some of what u said - just for my own understanding - if u can confirm if accurate @nodece?
When the pulsar broker identifies that the authenticated credentials have expired - and it issues the challenge to the client to refresh their credentials (e.g. as supported by "token" auth method - and as triggered by authenticationRefreshCheckSeconds
) - the client uses the same TCP connection, with the same TLS encryption to send their refreshed credentials to pulsar for re-auth. Whereas your point is that, if the client (or sever???) certificate has expired - the client needs to make a new TCP connection (which will of course negotiate TLS again also). This is why my PoC PR is not optimal(?)
@nodece - just to confirm the scope of what u are thinking. Would this certificate monitor JUST be checking the certificate provided by the client to auth? Or should it also check the expiry of the certificate used to setup the TLS transport, i.e. the
tlsCertificateFilePath
cert.
This is the same thing.
mTLS transport doesn't use the AuthenticationProvider, while mTLS auth uses it.
So your implementation only applies to mTLS auth, not mTLS transport.
BTW, please note that the client never sends an auth certificate. The auth certificate is from javax.net.ssl.SSLSession#getPeerCertificates
.
Therefore, your client needs to reconnect to the broker to refresh the certificate.
if the client certificate has expired - the client needs to make a new TCP connection (which will of course negotiate TLS again also). This is why my PoC PR is not optimal(?)
This is correct.
Do u have any further tips / advice re fixing this? Could this be implemented similar to how
refreshAuthenticationCredentials
is scheduled inServerCnx
? If i could get some direction I could help tackle this bug.
You can refer to the authRefreshTask
in the ServerCnx
, new a peerCertExpireCheckTask
schedule take with fixd delay(The expiration time of the certificate minus the current time), when the certificate expires, you can disconnect this connection.
After reaching a consensus on this idea, we need to send it to dev@pulsar.apache.org, please see https://pulsar.apache.org/contribute/#mailing-lists.
@nodece - just to confirm, with the stated goal of "disconnecting the client" - will/should that result in the pulsar client throwing an exception? (or in other words, having the app / client code - creating the pulsar consumer & producers again)
with the auth challenge approach, its super clean (for tokens at least), where all that happens is the token supplier method that was provided to the client when it was instantiated is (lazily) executed. No pulsar client exception is thrown in this case, and of course the pulsar consumers and producers that were created by the pulsar client can stay connected and authenticated. Understood that for TLS this is not an option - but I just wanted to compare and understand the pulsar client behaviour that would result from the changes being discussed.
You are right, when the broker disconnects the client connection, and then the client will reconnect to the broker, at this point, the client uses the new TLS certificate to establish the connection to the broker.
The token/basic and TLS auth are different behaviors, token/basic authentication data can independently send to the broker, TLS authentication is obtained from TLS session after establishing a connection, that's also why we need to disconnect the client connection.
Search before asking
Motivation
The behaviour that the
authenticationRefreshCheckSeconds
config enables should be available for all auth types whose credentials can expire. This is the case for JWTs (and i think also OAuth). So one motivation is simply providing consistency / no surprises. And of course this feature would be very useful for mTLS. Furthermore, without it, kind of makes using mTLS unattractive - and we can have obviously have scenarios where an cert that was used to auth, becomes expired - but the auth'd connection can remain. for infinity!Solution
There is an existing pattern for this, as implemented for tokens (and OAuth). Solution is to reverse engineer / grok that pattern - and apply it.
Also, this PIP and the the AuthenticationState provide good info on the interface methods that need implementing.
Alternatives
No response
Anything else?
I tagged this as an enhancement, but could be a bug depending on your perspective. At a min the associated docs are incomplete.
Are you willing to submit a PR?