Introduce SSL getter API for handshake state

Mbed-TLS / mbedtls

An open source, portable, easy to use, readable and flexible TLS library, and reference implementation of the PSA Cryptography API. Releases are on a varying cadence, typically around 3 - 6 months between releases.

https://www.trustedfirmware.org/projects/mbed-tls/

Other

5.56k stars 2.61k forks source link

Introduce SSL getter API for handshake state #4383

Closed hanno-becker closed 8 months ago

hanno-becker commented 3 years ago

Context: This was raised by Jeremy Audiger on the Mbed TLS mailing list.

Issue: When we make mbedtls_ssl_context internal, there is no supported way of extracting the handshake state. Given that we do expose the fact that the handshake happens in steps via mbedtls_ssl_handshake_step(), there should arguably be a public getter function that allows to retrieve the handshake state.

mpg commented 3 years ago

Note: this is a follow-up to #4371 and #4373.

If we execute #4371 without this follow-up it would make some people's working code "non-compliant" with no transition path. If we additionally execute #4373 without this follow-up, it would actually break working to without a transition path.

(In addition to the recent posting on the list, I've seen at least twice in the past code that accessed ssl->state, so I guess multiple users are going to be in that case.)

So while in principle it's something we could do any time in 3.x (add a getter function), in practice I think this is a necessary follow-up to #4373.

mpg commented 3 years ago

As I previously wrote elsewhere:

I don't think we necessarily want to make that field public (as it could hinder work on 1.3 which has a different state machine), but we probably want to make some info public, for example "is the handshake over yet?", otherwise exposing mbedtls_ssl_handshake_step() in the API makes little sense.

I'm ~not~ now thinking it's probably better to wait for feedback and see what people actually need, so that we can design an API that give them enough information, while not exposing too much of the internal state, which would again block us for future development.

gilles-peskine-arm commented 3 years ago

I'm not thinking

ITYM “I'm now thinking”?

I think that 3.0 or at least 3.1 should have getters for things we know people need. Sample programs would give us a clue. I'm not sure if the handshake state is one of those though.

gstrauss commented 2 years ago

As I previously wrote elsewhere:

I don't think we necessarily want to make that field public (as it could hinder work on 1.3 which has a different state machine), but we probably want to make some info public, for example "is the handshake over yet?", otherwise exposing mbedtls_ssl_handshake_step() in the API makes little sense.

I'm ~not~ now thinking it's probably better to wait for feedback and see what people actually need, so that we can design an API that give them enough information, while not exposing too much of the internal state, which would again block us for future development.

One data point for server-side use of mbedTLS is lighttpd. I am a lighttpd developer and author of lighttpd mod_mbedtls TLS module and I posted more detail in #5331, but wanted to mention here that lighttpd mod_mbedtls needs more than "is the handshake over yet?". lighttpd currently steps the handshake for a) ALPN to be able to handle server certificate selection and support RFC8737 Automated Certificate Management Environment (ACME) TLS Application-Layer Protocol Negotiation (ALPN) Challenge Extension, and for b) client certificate validation. reference: https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/mod_mbedtls.c#L2051

mpg commented 2 years ago

@ronald-cron-arm I think at this point you're the most familiar with the TLS 1.3 handshake states, could you have a look and comment?

I'm a bit providing a simple mbedtls_get_state() getter may introduced issues, because the application will then need to check the version and know about both 1.3 and 1.2 flows to interpret the result. (Though we already have a similar issue with session resumption which means you have basically two different flows in TLS 1.2, and apparently this hasn't been a problem.)

The alternative would be to add a series of functions like mbedtls_ssl_handshake_is_over(), mbedtls_ssl_handshake_certificate_request_was_sent()... the obvious drawback is that we have to know that functions are needed exactly, so every time someone needs something new, they need to ask us, and there is some delay. OTOH, perhaps the part about getting in touch is good, though, because it seems to me that at least the ALPN issue would be better solved by us providing an adequate certificate selection callback - and perhaps the client cert too, with a better CA callback?

ronald-cron-arm commented 2 years ago

@mpg ok I will have a look.

gstrauss commented 2 years ago

Another item which could use improvement in stepping handshake and which is very visible on server-side use of mbedTLS: some callbacks are only for mbedtls_ssl_config, and do not have a corresponding callback on mbedtls_ssl_context. This is not thread-safe when mbedtls_ssl_config is shared. Prime example: mbedtls_ssl_conf_sni() is for mbedtls_ssl_config, and there is no SNI callback override for specific mbedtls_ssl_context (!)

gilles-peskine-arm commented 2 years ago

I'm rather in favor of adding new functions with specific goals. The state is an internal detail and not every condition can be observed solely from the state field. For example, as we're seeing now, a new version of the protocol can introduce new states. Another example is a potential mbedtls_ssl_is_context_still_valid which requires looking at other fields (and we may or may not want change the state field on an error, as a compromise between implementation simplicity, ease of debugging and protection against accidental misuse: this shouldn't be considered an API change).

mpg commented 2 years ago

(Note: also requested by two people in #5184, correctly pointing out that currently there's no way to use mbedtls_ssl_handshake_step() as documented without using MBEDTLS_PRIVATE. Labeling "regression".)

paul-elliott-arm commented 2 years ago

Another request for state, for handshake reasons.

gstrauss commented 2 years ago

This issue is over 9 months old and is marked regression and is part of EPIC High priority 3.0 followup. It would be useful to get some guidance on how mbedtls library users should plan to step through the handshake. Thank you.

A month ago (above), @mpg suggested a server certificate callback (#5430) and maybe a client certificate callback. I am adding a note here that some additional access might be needed. For example, lighttpd configured with client certificate verification might replace the certificate chains after the ServerHello has been sent, but before client certificate verification since I did not find a better way to send DN hints in the ServerHello to the client for client certificate selection, and then after MBEDTLS_SSL_SERVER_HELLO_DONE to configure a larger trust chain for (server-side) client certificate verification.

I would imagine that many people using mbedtls 3.0.0 in the past 9+ months have had to make code changes to expose MBEDTLS_PRIVATE members. More than a few different people have reported issues with ssl->state, and likely worked around those issues by accessing ssl->state via ssl->MBEDTLS_PRIVATE(state).

lighttpd 1.4.64 mod_mbedtls attempts to isolate lighttpd mod_mbedtls access to ssl->state, and I plan to replace the solution for mbedtls >= 3.2.0 with the outcome of this issue. lighttpd 1.4.64 mod_mbedtls contains the following:

#if MBEDTLS_VERSION_NUMBER >= 0x03020000 /* mbedtls 3.02.0 */
#define handshake_state(ssl) (ssl)->MBEDTLS_PRIVATE(state)
#elif MBEDTLS_VERSION_NUMBER >= 0x03000000 /* mbedtls 3.00.0 */
#define handshake_state(ssl) (ssl)->MBEDTLS_PRIVATE(state)
#else /* MBEDTLS_VERSION_NUMBER < 0x03000000 */ /* mbedtls 3.00.0 */
#define handshake_state(ssl) (ssl)->state
#endif /* MBEDTLS_VERSION_NUMBER < 0x03000000 */ /* mbedtls 3.00.0 */

@ronald-cron-arm (since this is currently assigned to you), what do you think of a surgical solution to this issue? If creating mbedtls_ssl_get_hs_state() as an inline function in ssl.h sounds reasonable, I can create a PR.

static inline int mbedtls_ssl_get_hs_state( mbedtls_ssl_context *ssl )
{
    return ssl->MBEDTLS_PRIVATE(state);
}

The enum mbedtls_ssl_states of handshake states is public in ssl.h. Currently, the int MBEDTLS_PRIVATE(state); member of mbedtls_ssl_context is an int, but the inline function could alternatively return a mbedtls_ssl_states.

I think a simple accessor is a good solution. Those who are stepping through the handshake now likely have reasons to do so and are likely able to -- and maybe even desire to -- step through TLSv1.2 and TLSv1.3 flows.

On the other hand, if the mbedtls team has future plans and design constraints which require that mbedtls handshake state be encapsulated and hidden, and mbedtls aims to provide alternative functions to interface with all needed handshake points, then enum mbedtls_ssl_states of handshake states ought to be private, which is not currently the case. Caution: hiding enum mbedtls_ssl_states would most certainly break much existing use of mbedtls_ssl_handshake_step().

mpg commented 2 years ago

@gstrauss Yes, there was a planning mix-up, we failed to schedule work on these post-3.0 issues for the last two quarters, and we're really sorry for that. That's why we have finally created this EPIC: we realize most of these issues are overdue and we really want to address them soon. I also agree that this is the most important issue in the EPIC.

I think the reason why it hasn't progressed as much as other, perhaps less important issues, is that it's still not entirely clear (to me at least) what the best solution to the problem is. Actually I was hoping that after the other work (adding the end-of-clienthello callback, per-context pointer, etc.) perhaps it would be possible so expose just mbedtls_ssl_is_then_handshake_over_yet() rather than the full state.

Your message shows that's not the case and even after merging all the other PRs, you'd still like more fine-grained info about the state so you take some actions at specific times.

I'm really conflicted about this. On one hand, I feel like all the problems that I've seen so far resolved by accessing ssl->state could have better solutions with appropriate callbacks or additional configuration options (see below). So, I'd like us to provide those high-level solutions so that users don't have to look at such low-level information as the handshake state. OTOH, clearly we don't know about all the problems people are currently solving by accessing ssl->state, so we can't provide high-level solutions to all of them. So, perhaps as a middle ground we should indeed expose the full state with a simple getter as you suggest, but then add a warning to the documentation of that function saying that if you have a problem that can currently only be solved by using this function, please let us know about it, so that we can try to find a better solution to that problem?

For example, lighttpd configured with client certificate verification might replace the certificate chains after the ServerHello has been sent, but before client certificate verification since I did not find a better way to send DN hints in the ServerHello to the client for client certificate selection, and then after MBEDTLS_SSL_SERVER_HELLO_DONE to configure a larger trust chain for (server-side) client certificate verification.

I assume you mean after the CertificateRequest message has been sent, as that's the one that contains the DN hints? And that's my main problem with this kind of solution: as a user, you shouldn't have to know what's part of which message in each version of the protocol (cause it's vastly different between 1.2 and 1.3), we should provide solutions that don't require you to have such low-level knowledge of the protocol.

In that instance, IIUC the high-level problem is that we have only one source of information, namely the list of trusted roots configured with mbedtls_ssl_conf_ca_chain() / mbedtls_ssl_set_hs_ca_chain() for two different things: (1) the list of trusted roots used to verify the client's certificate, and (2) the list of DNs to send to the client as hints. It often makes sense for those two lists to be equal, but you have uses cases where you need them to be different. I think a better solution than observing ssl->state and trying to change things just at the right moment would be for us to add mbedtls_ssl_conf_dn_hint() / mbedtls_ssl_set_hs_dn_hint() so that you can configure (2) independently of (1) - and when (2) is not configured, it falls back to (1), so that the simple case remains simple. Wdyt?

gstrauss commented 2 years ago

@gstrauss Yes, there was a planning mix-up, we failed to schedule work on these post-3.0 issues for the last two quarters, and we're really sorry for that. That's why we have finally created this EPIC: we realize most of these issues are overdue and we really want to address them soon. I also agree that this is the most important issue in the EPIC.

It is clear that some of these post-3.0 issues have recently been getting attention and time commitment. Thank you very much for that.

For myself, "porting" lighttpd mod_mbedtls to mbedtls 3.0.0 has been an expensive time expenditure (unpaid open source), and I probably am not the only one who has found the migration expensive. So that I could try to avoid an increase in user support questions, I ended up releasing lighttpd 1.4.64 mod_mbedtls using

#ifndef MBEDTLS_ALLOW_PRIVATE_ACCESS
#define MBEDTLS_ALLOW_PRIVATE_ACCESS
#endif

I am all in favor of good code encapsulation and progressive changes, but also respectfully request migration paths. I hope me writing the sloppy shortcut above will help you to convince the team that 3.0.0 and 3.1.0 need further work to address various usage that has been broken due to the large-scale privitization of struct members. The sooner the better because I am sure others have used the trivial shortcut above, too, and will break in the future when mbedtls makes changes to private members.

Hence my request for guidance for this ssl->state interface oversight and regression. Use of ssl->state is on the critical path for lighttpd mod_mbedtls -- and lighttpd mod_mbedtls could immediately and fully fail if ssl->state were suddenly changed incompatibly. As noted and recognized above, there is no current migration path. My request for guidance is more along the lines of "mbedtls is planning to change these internals soon" and will break current usage of the private ssl->state member, "mbedtls is planning to change the internals in XX months", or "mbedtls is not planning to change these internals in 3.x" so that myself and others can plan when we need to allocate time for further integrations and application releases. Since @paul-elliott-arm had not commented on this in a month, I had hoped that touching it could get this noticed and scheduled. (Maybe not immediately, but somewhere on the schedule.) Some of the comments further up seem to suggest that additions/changes to ssl->state might be needed for TLSv1.2 and TLSv1.3 flows.

I fully agree with you that higher-level interfaces are desirable and should be on the migration paths medium-term to long-term. mbedtls_ssl_set_hs_dn_hint() or similar might provide a solution to one of my issues (and yes, thank you for correcting me that I meant the CertificateRequest message).

At the same time, tearing out existing application plumbing to migrate to new interfaces -- some of which have not yet been designed or written (!!) -- can be a costly endeavor for developers using the mbedtls library, especially for applications developers who intend to continue to support their applications with older versions of the mbedtls library. I do not think rewriting central application logic is a short-term task.

OTOH, clearly we don't know about all the problems people are currently solving by accessing ssl->state, so we can't provide high-level solutions to all of them.

This is the crux of the issue. Access to the low-level ssl->state has been beneficial and allowed solutions to be written to problems that the mbedtls team might not yet have considered. Please consider keeping this low-level interface, and perhaps document that the low-level interface is raw and somewhat fungible, meaning that higher level interfaces should be strongly preferred, and higher-level interfaces should be requested if missing. Thank you for your engagement in this conversation. :)

mpg commented 2 years ago

OTOH, clearly we don't know about all the problems people are currently solving by accessing ssl->state, so we can't provide high-level solutions to all of them.

This is the crux of the issue. Access to the low-level ssl->state has been beneficial and allowed solutions to be written to problems that the mbedtls team might not yet have considered. Please consider keeping this low-level interface, and perhaps document that the low-level interface is raw and somewhat fungible, meaning that higher level interfaces should be strongly preferred, and higher-level interfaces should be requested if missing.

Yes, the recent discussion have convinced me that the best way to move forward is probably to expose the full state, but with a clear warning that this should be a last resort.

My request for guidance is more along the lines of "mbedtls is planning to change these internals soon"

I think @ronald-cron-arm is probably the best person to answer this question, as I think the only reason we might want to change that soon would be work on TLS 1.3. Perhaps we're not going to change anything, except that once we add support for TLS 1.3 server-side, in your code on top of checking the state you'll have to check the negotiated version to see how that state should be interpreted. Which I think would fit the status of the state getter as "last resort low-level solution" as discussed about. Or perhaps deeper changes are coming.

gilles-peskine-arm commented 2 years ago

The recent discussion has me convinced that we should not expose the handshake state, only functions or hooks with a clear semantics. The handshake state is different for TLS 1.2 and 1.3, and I don't know if DTLS 1.3, or other protocols we might add in the future, will have new states. New protocol extensions might also change the expectations in a given state.

state getter as "last resort low-level solution"

This already exists: ssl->MBEDTLS_PRIVATE(state). You can use this, and it's immediately visible that you're using a feature that might break when you upgrade Mbed TLS.

gstrauss commented 2 years ago

@gilles-peskine-arm maybe I wasn't clear. Littering code with ssl->MBEDTLS_PRIVATE(state) is extremely messy. However, since interfaces and migration paths are missing in mbedtls 3.0 -- intentionally so from the mbedtls team (to discover use) -- code changes need to be made by application developers. Changes will need to be made multiple times since some interfaces and migration paths do not yet exist.

Some OS distros have decided that mbedtls 3.0.0 is not a mature release as it is unusable unless many applications are modified. Alternatively, instead of littering code with ssl->MBEDTLS_PRIVATE(state), a quick-and-dirty "kick-the-can" solution is to #define MBEDTLS_ALLOW_PRIVATE_ACCESS at the top of the code. Then, this remnant might be left in the code, and the mbedtls team probably does not hear about the usage. Lose-lose for everyone.

Please try to understand that the mbedtls team has intentionally broken lots of code with the 3.0.0 release and, more importantly, since mbedtls 3.0.0 was tagged 6 Jul 2021 (over 7 month ago), there are still critically important missing interfaces. Taken together (important point) this is not a good look for mbedtls.

I know that you worked on the MBEDTLS_PRIVATE "migration path", so this feedback is aimed at you: MBEDTLS_PRIVATE is a dirty workaround and not a proper migration path when there are missing interfaces, and many people busier than I will simply #define MBEDTLS_ALLOW_PRIVATE_ACCESS and wait for mbedtls to break in the future. OS distros are generally slow to upgrade mbedtls libraries even for security fixes, and the mbedtls 3.0.0 release has sent a loud and clear message to package maintainers that mbedtls 3.x is not yet ready for production deployment in any more than a single-application environment. I sincerely hope that was not your intention.

Please keep in my that my reason for posting these screeds is because this specific ssl->state issue has had long periods of silence from the mbedtls team despite being an issue affecting many people. Especially if the mbedtls team forsees upcoming breaking changes needed for TLSv1.3 support in mbedtls 3.x, where is fixing this issue on the roadmap? Q1 2022? Q2 2022? Q3 2022? ... Long periods of silence suggest this issue is not being addressed.

paul-elliott-arm commented 2 years ago

@gstrauss I can assure you this issue is at the top of my list to be addressed as soon as possible, and I can but apologise for the inconvenience caused, and the time taken.

I also thank you for your discussion as it has changed my view on how this should be solved, as it has others. I will chase up further discussion on this next week, if that is ok.

gstrauss commented 2 years ago

@paul-elliott-arm Thank you. Starting on this next week is great. I am requesting planned activity, not immediate solutions. Please help set expectations. If mbedtls will soon need to make changes with ssl->state for TLSv1.3, then a more detailed roadmap for how and when existing application use needs to change would be appreciated. Not all consumers of mbedtls are operating in closed environments and control their own deployment schedules. That is why I mentioned OS distros, where different packagers control how and when mbedtls libraries are released, and different packagers control when individual applications are released, and end-users upgrade at different times. While multiple versions of shared libraries can co-exist, sometimes more advanced package management use is needed to have multiple versions of the same package installed, e.g. mbedtls libraries. This can cause problems and confusion for end-users.

state getter as "last resort low-level solution"

This already exists: ssl->MBEDTLS_PRIVATE(state). You can use this, and it's immediately visible that you're using a feature that might break when you upgrade Mbed TLS.

@gilles-peskine-arm I think I misread this and responded too strongly. ssl->MBEDTLS_PRIVATE(state) is not a low-level solution unless mbedtls plans to support this "deprecated" interface for the entirety of the 3.x lifetime. Is that the case? My limited understanding is that anything marked with MBEDTLS_PRIVATE might break at any time in 3.x. Using MBEDTLS_PRIVATE would be a quick short-term fix for code migrating to mbedtls 3.x when also paired with medium-term migration paths to new interfaces. Without being paired with migration paths, MBEDTLS_PRIVATE by itself is a threat that users of those private members are "doing something wrong" (not really) and are in danger of being broken by the very next 3.x.x point release.

gilles-peskine-arm commented 2 years ago

Littering code with ssl->MBEDTLS_PRIVATE(state) is extremely messy.

It's meant to be. It's only supposed to be a temporary workaround.

However, since interfaces and migration paths are missing in mbedtls 3.0 -- intentionally so from the mbedtls team (to discover use)

Migration paths were not left intentionally to discover use. They were left because we ran out of time before the delivery date for the 3.0 release.

a quick-and-dirty "kick-the-can" solution is to #define MBEDTLS_ALLOW_PRIVATE_ACCESS at the top of the code

I don't recommend that, precisely because as you note later, the Mbed TLS maintainers won't find out about the missing interface, and you'll be left in the lurch when the interface breaks.

since mbedtls 3.0.0 was tagged 6 Jul 2021 (over 7 month ago), there are still critically important missing interfaces. Taken together (important point) this is not a good look for mbedtls.

I am very much aware of that and I've alerted my management to this thread.

I know that you worked on the MBEDTLS_PRIVATE "migration path", so this feedback is aimed at you: MBEDTLS_PRIVATE is a dirty workaround and not a proper migration path when there are missing interfaces

Once again, it was never intended as a migration path. It was intended as a last resort temporary workaround.

ssl->MBEDTLS_PRIVATE(state) is not a low-level solution unless mbedtls plans to support this "deprecated" interface for the entirety of the 3.x lifetime. Is that the case? My limited understanding is that anything marked with MBEDTLS_PRIVATE might break at any time in 3.x. Using MBEDTLS_PRIVATE would be a quick short-term fix for code migrating to mbedtls 3.x when also paired with medium-term migration paths to new interfaces.

Yes, this is correct. I was responding to your suggestion of a getter function which we could not guarantee to keep working for the lifetime of 3.x. That would have been no better than using MBEDTLS_PRIVATE.

mpg commented 2 years ago

Especially if the mbedtls team foresees upcoming breaking changes needed for TLSv1.3 support in mbedtls 3.x, where is fixing this issue on the roadmap? Q1 2022? Q2 2022? Q3 2022? ...

As far as I'm concerned, the goal is for this issue to be fixed (in some form, not necessarily providing access to the full state) in the next release. I understand your frustration over the past 6 months of inaction on this, but that's done and the only thing we can do know is fix the issue in the next release and analyze what caused this situation so that it doesn't happen again in the future.

Now, I don't think the unfortunate situation should make us rush the decision about what the right way for fixing this is. I think Gilles has a strong point that it's not clear that a getter would be really better than ssl->MBEDTLS_PRIVATE(state) if the meaning of the result is not stable over time. Practically, if an application (for a server, say) is written now using a getter, and bases its logic on TLS 1.2 states, then when we implement server-side TLS 1.3, then the application's code has to be changed when upgrading to the new Mbed TLS. We do promise that applications using only the public API (without relying on undocumented assumption) will not need to change their code when upgrading to a new version of Mbed TLS (until the next major version), so the state getter would not fit that promise.

Now, of course we could provide the getter with an explicit name like get_state_unstable() and add a big fat warning to its documentation like "use this if you have to, but this voids you backwards compatibility guarantee" - but then again, as Gilles said, this is exactly what MBEDTLS_PRIVATE() means.

Thinking about it a bit more, I'm not sure why I asked @ronald-cron-arm to comment on the impact of TLS 1.3, because I know enough about the differences between the 1.2 and 1.3 to conclude that in many cases where an application uses the state to determine when is the right time to perform and action, the right time will not be the same between 1.2 and 1.3, so there's no way for a full state getter to be be any more stable than MBEDTLS_PRIVATE() is, if we want to leave the door open to new protocol versions (and we do). More importantly, we know nothing about future evolutions of the protocol so we can't guarantee that anything an application does by inspecting the state will keep working in the future.

Said otherwise, I think applications using ssl->state were always going to get broken when 1.3 is introduced; MBEDTLS_PRIVATE() didn't create that problem, it only made it visible, which is good, it's better to know now than having bugs (possibly silent and/or with security impact) when we finally ship support for 1.3 server-side. So, the problem we really want to solve is not "get rid of MBEDTLS_PRIVATE()" (that would be shooting the messenger), but "make sure the high-level problems that applications are trying to solve can be solved without relying on protocol details that may and will change in the future".

In conclusion, I'm now thinking a full state getter would only provide the illusion of solving the issue, and I got back to square one: I think we should provide only mbedtls_ssl_is_the_handshake_complete_yet() (for general use with mbedtls_handshake_step()) + targeted fixes for each of the problems that were previously solved by looking at the state. Unless I'm mistaken, we have solutions in flight for all of these problems except for the dn_hint thing which does not look particularly hard to solve.

gstrauss commented 2 years ago

I think that there is at least one simple solution: mbedtls could provide a stable interface for stepping the handshake for TLSv1.2, e.g. the one-liner that I suggested above, potentially named more specifically with _tlsv12 suffix. If mbedtls announces that this interface will not be usable with TLSv1.3, that is acceptable to me. Please provide a way for me to configure my application to detect TLSv1.3 (#5407, #5426) and then to use different interfaces with TLSv1.3 (not yet provided). Bonus points if the new interfaces can also be used for TLSv1.2. Such an approach would minimize changes needed to existing application code (supporting TLSv1.2) and would provide a migration path to add support for TLSv1.3. This proposed solution would also make it easier for applications to continue to support older versions of mbedtls.

Unless I'm mistaken, we have solutions in flight for all of these problems except for the dn_hint thing which does not look particularly hard to solve.

While my use of mbedtls in lighttpd mod_mbedtls has demonstrated server-side use of mbedtls in one application (lighttpd), I am uncomfortable with the suggestion that this covers all server-side use cases. There are probably other server-side uses which may require additional interfaces. For example, lighttpd currently does not support DTLS.

Regarding the suppositions about user-visible changes which might be needed for TLSv1.3 support in mbedtls, I can provide the perspective that lighttpd supports TLSv1.3 using lighttpd mod_openssl, mod_gnutls, mod_nss, and mod_wolfssl. Using the interfaces available in those libraries, I did not have to special-case differences between TLSv1.2 and TLSv1.3. Yes, I am aware that there are differences in the handshakes, but thankfully those are handled by the TLS libraries. Hopefully, mbedtls can provide higher-level interfaces at checkpoints during the handshake so that precise stepping of the TLS handshake can be avoided by applications.

ronald-cron-arm commented 2 years ago

Just a word to say that I have tried to follow the discussions on this issue. To me, there is a lot of things going on here thus not easy for me to step in. I've assigned this issue to myself at some point as @mpg asked my opinion about it without realizing that this was such an hot topic, sorry about that. I am removing me as the assignee as @paul-elliott-arm is taking ownership it seems.

ndilieto commented 2 years ago

ualpn is a proxying ACMEv2 RFC8737 tls-alpn-01 challenge responder. It makes extensive use of the SSL.state variable to determine on the fly whether to handle the tls-alpn-01 challenge, or transparently proxy the connection:

https://github.com/ndilieto/uacme/blob/64d1b3c260c782c037246ef63e16492f20c6fd9d/ualpn.c#L2280

gstrauss commented 2 years ago

@ndilieto thank you for the additional data point.

Please see PR #5454 which introduces a server certificate selection callback and references the new user-data accessor. In the future (mbedtls 3.2.0 or 3.?.x), it may be possible to save state in the ALPN callback, and then to select the tls-alpn-01 challenge certificate from within the (proposed) server certificate selection callback. This combination may avoid the need to step through the handshake.

FYI, lighttpd natively supports the tls-alpn-01 challenge without needing an intermediate TLS proxy.

ndilieto commented 2 years ago

@gstrauss

FYI, lighttpd natively supports the tls-alpn-01 challenge without needing an intermediate TLS proxy.

Does it generate the certificate based on the tls-alpn-01 key authorization or does it need the certificate to be provided from outside? How does one provide either the key auth or the certificate for the challenge? Would you be willing to provide an example uacme-lighttpd tls-ualpn-01 challenge script for inclusion on https://github.com/ndilieto/uacme#tls-alpn-01-challenge-support ?

gstrauss commented 2 years ago

@ndilieto I think your request is off-topic for this github issue. Briefly, lighttpd can be configured with a path to obtain the challenge cert for a TLS connection using SNI and ALPN tls-alpn-01. An admin can use dehydrated or other tool to handle the ACME protocol and generate the challenge cert. Yes, I agree it would be nice if certbot supported tls-alpn-01.

ndilieto commented 2 years ago

I think your request is off-topic for this github issue.

said the one who brought up lighttpd in the first place...

mpg commented 2 years ago

@ndilieto

ualpn is a proxying ACMEv2 RFC8737 tls-alpn-01 challenge responder. It makes extensive use of the SSL.state variable to determine on the fly whether to handle the tls-alpn-01 challenge, or transparently proxy the connection:

Thanks for letting us know! After a quick look at your code, it looks to me like #5454 should fulfill your needs: basically, the body of the if(c->ssl.MBEDTLS_PRIVATE(state) > MBEDTLS_SSL_CLIENT_HELLO) would become (perhaps with minor adaptations) the body of the function used as the new callback. Then I think you wouldn't even need to step through the handshake and could just use mbedtls_ssl_handshake(). But if you need to keep using mbedtls_ssl_handshake_step() for another reasons, we're going introduce a new public API to replace MBEDTLS_PRIVATE(state) != MBEDTLS_SSL_HANDSHAKE_OVER anyway.

I think that would cover all your uses of ssl->state. Do you agree with that or do you think I missed anything?

I did a quick grep for other uses of MBEDTLS_PRIVATE() in your code to check if we already had issues/PRs covering them, and then saw you had already opened https://github.com/ARMmbed/mbedtls/issues/5585 - thanks!

Note: while looking at the code you linked to: in principle, to provision a certificate/key during the handshake, you should be using mbedtls_ssl_set_hs_own_cert(), not mbedtls_ssl_conf_own_cert(). As a general rule, we do not support modifying ssl_config objects after they've been associated to an ssl_context, especially when a handshake is in progress. On top of the obvious concurrency issues (which are easy to rule out if you know your ssl_config is used by only one thread), our code assumes that the config remains the same between steps, and changing the config may cause the code to misbehave in unexpected ways.

If you ever feel the need to modify the config during a handshake, please check if there's an equivalent set_hs_xxx() function, and if not, please file an issue or open a discussion on the mailing list explaining what you're trying to achieve, so that we can discuss the best way to extend the API to support your use case.

ndilieto commented 2 years ago

@mpg

Note: while looking at the code you linked to: in principle, to provision a certificate/key during the handshake, you should be using mbedtls_ssl_set_hs_own_cert(), not mbedtls_ssl_conf_own_cert(). As a general rule, we do not support modifying ssl_config objects after they've been associated to an ssl_context, especially when a handshake is in progress. On top of the obvious concurrency issues (which are easy to rule out if you know your ssl_config is used by only one thread), our code assumes that the config remains the same between steps, and changing the config may cause the code to misbehave in unexpected ways.

For ualpn to work properly, it must NOT write to the socket until it has determined (from the TLS ClientHello) that the handshake is a tls-alpn-01 challenge AND it has a SNI for which an authorization has been made available to ualpn. This is because in all other cases it must transparently proxy the connection. This also means that it cannot set the cert until ssl.state > MBEDTLS_SSL_CLIENT_HELLO, and that is why it must step through the handshake.

Is there any requirement on when mbedtls_ssl_set_hs_own_cert must be called? In other words, is it going to work if called after the mbedTLS has parsed the ClientHello?

mpg commented 2 years ago

@ndilieto Good question! Currently the documentation only guarantees it works when called from the SNI callback. With #5454 the documentation should be extended to mention that it can also be called in the new certificate selection callback.

Now that you mention it, strictly speaking neither set_hs_own_cert() nor conf_own_cert() is safe to call when ssl.state > MBEDTLS_SSL_CLIENT_HELLO in full generality. That's because the ciphersuite is selected before we move to the next state, and if we selected a cipersuite that requires, say an ECDSA cert the certificate provided later turns out to be RSA, then clearly things are not going to work well. The callback introduced in #5454 can avoid that by being called before we select a ciphersuite.

That's one of the many reasons why providing callbacks is way more robust than letting people peek at the state and try to take actions at the right moment: the right moment might not be between two states, and this depends on implementation details such as do we select the ciphersuite at the end of ClientHello parsing or at the beginning of ServerHello writing. If we provide a callback, we can call it at precisely the right time, users don't have to worry about implementation details, and we are free to restructure the code. Callbacks are just a way better API design than exposing ssl.state.

ndilieto commented 2 years ago

@mpg

I have tried mbedtls_ssl_set_hs_own_cert and it works. Regarding your comment about the ciphersuite, I know the issue well and you will see that in the mbedTLS port ualpn uses a dummy certificate with exactly the same cipher, key type and key length before it switches to the actual certificate based on the SNI and alpn extension:

https://github.com/ndilieto/uacme/blob/64d1b3c260c782c037246ef63e16492f20c6fd9d/ualpn.c#L4590

This was the only way I could find to make it work with mbedTLS. OpenSSL and GnuTLS both allow ClientHello callbacks (SSL_CTX_set_client_hello_cb and gnutls_handshake_set_post_client_hello_function) so I agree that the callback in #5454 is the way to go. My requirements are

1) The callback shall be called after parsing all extensions (ualpn needs to know both the SNI and alpn extensions in order to determine whether to proxy the connection or continue with the tls-alpn-01 challenge handshake), but before any reply is sent to the client.

2) The callback shall allow to abort the handshake before any reply (including errors) is sent to the client. Just to make sure, in the current implementation, I also intercept any writes using mbedtls_ssl_set_bio and buffer the data until a decision to handle it or not has been made: https://github.com/ndilieto/uacme/blob/64d1b3c260c782c037246ef63e16492f20c6fd9d/ualpn.c#L2219

Hope this helps...

mpg commented 2 years ago

I agree that the callback in #5454 is the way to go. My requirements are [...]

I think both of these requirements are met by #5454, which is on its way to being merged soon, so all seems good. Please give the new callback a try and let us know if you encounter any issues.

neoxic commented 2 years ago

In my projects, ssl->state is used to determine a particular DTLS state when a verified ClientHello has been received while a ServerHello is yet to be sent. This is needed to work around a race between bind and connect that introduces a brief window when unfiltered messages can be received on a "connected" UDP socket (see this https://gist.github.com/neoxic/0d9314ec756d37ca4303bced49b94543).

In other words, it is imperative to delay sending a ServerHello before reliably "forking" a UDP socket into a so-called "connected" state and using it with a DTLS context in the same way a TCP socket is used. The workaround to achieve that using Mbed TLS is this:

do {
    if (mbedtls_ssl_handshake_step(ssl)) goto error;
} while (ssl->state != MBEDTLS_SSL_SERVER_HELLO_DONE);

At this point, ssl is in a successful state since an unverified ClientHello never reaches MBEDTLS_SSL_SERVER_HELLO_DONE, but a ServerHello is still pending output, i.e. has not been sent. It's safe now to create a new "connected" socket and assign it as a new BIO to ssl.

Similarly, this is how it is done in lua-mbedtls (a Lua binding for Mbed TLS): https://github.com/neoxic/lua-mbedtls/blob/master/src/ssl.c#L247

The problem that is being solved relates to the following OpenSSL issue (which is still open btw). Please see my comments: https://github.com/openssl/openssl/issues/6934#issuecomment-569387432 https://github.com/openssl/openssl/issues/6934#issuecomment-569475648

gstrauss commented 2 years ago

@neoxic please have a look at #5454 "server certificate selection callback". This new callback occurs unconditionally after the ClientHello has been received and parsed, including all TLS extensions in the ClientHello. If this point is reached, then construction of the ServerHello is likely to proceed immediately afterwards. Is this point late enough for you to create a new "connected" socket and assign it as a new BIO to ssl, or is it still too early?

mpg commented 2 years ago

@neoxic Thanks for letting us know!

I assume when you talk about a verified ClientHello, it means on that contains a valid cookie. So, the new callback introduced in #5454 would only work for you if either (1) it's only called when a valid cookie was found, or (2) there's a way for you to check from the callback whether a valid cookie was found. I had to check the code, but unfortunately right now (1) is not true and neither is (2), though any of them could be done.

Sorry if I'm missing something obvious, but thinking about it, it seems to me that you could achieve what you want just by using the existing cookie callback - you don't even need to write your own callback from scratch, you could just wrap the provided one, for example:

int my_ssl_cookie_check( void *p_ctx,
                      const unsigned char *cookie, size_t cookie_len,
                      const unsigned char *cli_id, size_t cli_id_len )
{
    int ret = mbedtls_ssl_cookie_check( p_ctx, cookie, cookie_len, cli_id, cli_id_len );
    if( ret == 0 ) {
        /* Create a new connected socket and assign it as the new BIO */
    }
    return( ret );
}

Wouldn't that work?

Also, just for my own education about the big picture (btw, thanks for the example programs and pointers to your comments on the openssl discussion, which were quite informative): how does waiting until the ClientHello has been authenticated solve the race condition? Sorry if this is obvious, but right now it's not clear to me.

neoxic commented 2 years ago

Unfortunately, I don't see how the existing cookie callback can solve the problem since mbedtls_ssl_conf_dtls_cookies() assigns it to a configuration rather than a context, hence it's impossible to get a context pointer (or any other session state pointer for that matter) as its p_ctx argument to operate upon (change BIO for example).

How does waiting until the ClientHello has been authenticated solve the race condition? Sorry if this is obvious, but right now it's not clear to me.

Waiting for a verified ClientHello has nothing to do with the race itself. The need to circumvent the race condition makes it necessary to do reliable socket "forking" which in turn requires two-step DTLS handshake handling. In other words, the following steps are required to implement a reliable (and correct) session-based DTLS server where data multiplexing is done by the kernel:

Create a main UDP socket with bind().
On every message received on the main socket, create a new SSL context, feed it with the received message and call mbedtls_handshake_step() until a ServerHello is about to be sent. If successful, go to step 3. Otherwise, terminate the context. This way, the main socket always remains stateless.
For the new SSL context, create a "forked" UDP socket by calling bind() and connect() and continue the unfinished handshake on it. Be ready to receive unfiltered messages on that socket by checking their peer addresses due to race between bind() and connect() and if received, reroute them to step 2 as though they were received on the main socket.

Personally, I would really appreciate if there was an explicit synchronous call to achieve the above, i.e. to defer a ServerHello for a context because this problem should be exposed as much as possible and not complicated even more by fiddling with callbacks although an additional callback-based API might be a plus.

ndilieto commented 2 years ago

@neoxic

Even though I use TLS (not DTLS) in my project, I have a similar problem: I must avoid sending any data to the client until the software has determined that it wants to process the handshake rather than transparently proxy it. Therefore I intercept what mbedtls writes using a BIO and buffer it until the decision is made. Then I either send it the client or discard it. I don't know if a similar approach could also work for you but have a look at it:

https://github.com/ndilieto/uacme/blob/64d1b3c260c782c037246ef63e16492f20c6fd9d/ualpn.c#L2219

neoxic commented 2 years ago

@ndilieto Well, according to your code:

    while (c->ssl.MBEDTLS_PRIVATE(state) != MBEDTLS_SSL_HANDSHAKE_OVER) {
        rc = mbedtls_ssl_handshake_step(&c->ssl);
        ...
        if (c->ssl.MBEDTLS_PRIVATE(state) > MBEDTLS_SSL_CLIENT_HELLO)

... you are in a similar boat here.:)

ndilieto commented 2 years ago

@neoxic Yes, but in my case the new server certificate selection callback in #5454 will solve the problem.

Edit: what I meant is that if the callback alone isn't enough for your application, maybe you could intercept the socket writes with a BIO like I did.

neoxic commented 2 years ago

what I meant is that if the callback alone isn't enough for your application, maybe you could intercept the socket writes with a BIO like I did.

To intercept socket writes, I must first know in advance whether a ClientHello was valid (verified) or not. And since my way to realize that is through ssl.state and mbedtls_ssl_handshake_step(), I don't even need to defer writes.

neoxic commented 2 years ago

@mpg

Since mbedtls_ssl_handshake_step() doesn't seem to be deprecated (and I hope it will never be), not is the mbedtls_ssl_states enum, the right thing to do would be to simply expose ssl.state indirectly. I hope everyone can see that mbedtls_ssl_handshake_step() doesn't make sense without some kind of a "step" or "state" to track progress. So something like:

int mbedtls_ssl_handshake_step(int *state)

would work just fine.

Another approach would be to provide a way to perform a handshake until a certain step is about to begin:

int mbedtls_ssl_handshake_until(mbedtls_ssl_context *ssl, int state) {
    int res = 0;
    while (ssl->state != MBEDTLS_SSL_HANDSHAKE_OVER) {
        if ((res = mbedtls_ssl_handshake_step(ssl))) break;
        if (ssl->state == state) break;
    }
    return res;
}

neoxic commented 2 years ago

@mpg

I know enough about the differences between the 1.2 and 1.3 to conclude that in many cases where an application uses the state to determine when is the right time to perform and action, the right time will not be the same between 1.2 and 1.3, so there's no way for a full state getter to be be any more stable than MBEDTLS_PRIVATE() is, if we want to leave the door open to new protocol versions (and we do). More importantly, we know nothing about future evolutions of the protocol so we can't guarantee that anything an application does by inspecting the state will keep working in the future.

To be honest, I'm not sure I fully understand why you guys are so determined to babysit the user when they want to shoot their own foot playing with the states. Some users already do things wrong, for example, when they use '>' (greater) or '<' (less) with regards to ssl.state because there has never been any guarantee that the state always increases with time. It is already the case for DTLS vs TLS (MBEDTLS_SSL_SERVER_HELLO_VERIFY_REQUEST_SENT being one example). Hence, there's no point to try to hide "protocol internals" from the user starting with 3.0 (this has nothing to do with MBEDTLS_PRIVATE btw, see my proposals above). Similarly, while introducing new states for TLS 1.3, you will most likely use at least some existing states meaning there will be common states, new states and protocol-related states, and this will not break anything in terms of guarantees because IT IS ALREADY THIS WAY NOW. You are free to add states as you please.

My point is how do you think people came up with their solutions on how and when certain states relate to certain points in the protocols? Simple. They experimented because the state machine has never been officially documented. If their code breaks with the introduction of new TLSv1.3-related states, then, well, they will need to fix it. And you might optionally help them by documenting the differences should you feel inclined.😊

Meanwhile, you can continue thinking about new high-level APIs that solve particular tasks and encourage users to switch to those APIs whenever they become available.

Win-win for everybody, no?

gstrauss commented 2 years ago

@neoxic's comments suggest that he has not read the prior conversations in this PR.

gstrauss commented 2 years ago

@gstrauss This PR?... You mean, this issue? Well, I've read every comment in this issue. What exactly made you think that I have not?

@gilles-peskine-arm already explained further above that MBEDTLS_PRIVATE can be used to access ssl.state, but that anything marked MBEDTLS_PRIVATE should be considered unstable and might break in the future without prior notice. Not only are application developers (like you and me) encouraged to use higher-level APIs, we are also encouraged to reach out to mbedtls developers to let them know of use cases for which a higher-level API does not exist.

@neoxic Your proposals are neither new or unique, and suggest that you have not carefully read the prior conversations in this PR. For example, I, too, previously posted that mbedtls_ssl_states enum is public, and requested that mbedtls_ssl_handshake_step() be supported or deprecated (along with appropriate alternatives).

Please look for "X hidden items" "Load more..." on this page to review conversations that might have been hidden by the initial github PR page rendering.

gilles-peskine-arm commented 2 years ago

To be honest, I'm not sure I fully understand why you guys are so determined to babysit the user when they want to shoot their own foot playing with the states.

MBEDTLS_PRIVATE can be used to access ssl.state, but that anything marked MBEDTLS_PRIVATE should be considered unstable and might break in the future without prior notice

My point is, if you want to shoot your own foot, you have MBEDTLS_PRIVATE. What would you gain by having mbedtls_ssl_get_state() outputting a mbedtls_ssl_state_t documented as “the range and meaning of the returned values may change in later versions”?

mpg commented 2 years ago

@neoxic

Sorry if I'm missing something obvious, but thinking about it, it seems to me that you could achieve what you want just by using the existing cookie callback

Unfortunately, I don't see how the existing cookie callback can solve the problem since mbedtls_ssl_conf_dtls_cookies() assigns it to a configuration rather than a context, hence it's impossible to get a context pointer (or any other session state pointer for that matter) as its p_ctx argument to operate upon (change BIO for example).

Ok, so that's the obvious issue that I was missing, thanks for pointing it out.

Regarding exposing the state with a "no guarantee" warning - Gilles made his point again while I was writing and I think it's a really strong point.

So, the outcome of this discussion (plus some internal discussions) is that we favour approaches based on callbacks. I take your point that neither the existing cookie callback nor the new certificate selection callback introduced in #5454 meet your needs so far.

It seems to me that the new callback from #5454 would meet your needs if modified the code slightly so that the callback is only called when a valid cookie was found (or when using TLS obviously), which I think makes sense: why would you spend time selecting a certificate if you're not going to continue the handshake anyway? @gstrauss wdyt?

mpg commented 2 years ago

I hope everyone can see that mbedtls_ssl_handshake_step() doesn't make sense without some kind of a "step" or "state" to track progress.

I heard some people were using handshake_step() instead of handshake() for latency reasons, so that they can attend to other tasks (unrelated to the handshake) between steps in single-threaded environments. I agree this is a bit marginal (it only works if your acceptable latency falls right between the duration of the slowest step and that of a full handshake), but apparently some users where in that case. Also, in the meantime we've developed a better solution for single-threaded users who can't accept the handshake blocking for too long (and are not using RSA or FFDH), in the shape of MBEDTLS_ECP_RESTARTABLE, so arguably this use is even more marginal now.

That said, I fully agree that our public API right now is not consistent, as mbedtls_ssl_handshake_step() can't be used at all (even for the marginal use case mentioned above) without at least a way to know when the handshake is over. Worse, its documentation recommends checking the state while it's not longer publicly accessible. I think there are mainly several way to resolve this inconsistency:

Add a function like mbedtls_ssl_is_the_handshake_over_yet() and have the documentation of mbedtls_ssl_handshake_step() say to use that new function to decide when to stop calling handshake_step().
Add a big fat warning to the documentation of mbedtls_ssl_handshake_step() saying you can only use that in conjunction with MBEDTLS_PRIVATE() to access the state, which voids your guarantee, so if you find yourself needing to do that, please get in touch so that we can add a solution to your problem that will be guaranteed to work in the future.
A compromise: add is_the_handshake_over_yet() and document that handshake_step() should be used with that, but also add a word about ssl->MBEDTLS_PRIVATE(state) with a big fat warning as above.

gstrauss commented 2 years ago

It seems to me that the new callback from https://github.com/ARMmbed/mbedtls/pull/5454 would meet your needs if modified the code slightly so that the callback is only called when a valid cookie was found (or when using TLS obviously), which I think makes sense: why would you spend time selecting a certificate if you're not going to continue the handshake anyway? @gstrauss wdyt?

@mpg I have not traced through the code to understand the DTLS flows, so this is a question: ssl->handshake->verify_cookie_len is set in ssl_parse_client_hello() (if ssl->renego_status == MBEDTLS_SSL_INITIAL_HANDSHAKE) before ssl->conf->f_cert_cb(). However, later in the handshake in ssl_write_server_hello(), verify_cookie_len is checked and ssl_write_hello_verify_request() is sent. That looks like it should stay in ssl_write_server_hello(), but can it also be called in ssl_parse_client_hello(), short-circuiting ssl_parse_client_hello(), when ssl->handshake->verify_cookie_len is set in ssl_parse_client_hello()?

Possible overload suggestion: Now that there is mbedtls_ssl_conf_get_user_data_p() and mbedtls_ssl_get_user_data_p(), might ssl->conf->p_cookie be overloaded so that if ssl->conf->p_cookie is NULL, then mbedtls_ssl_context is passed to ssl->conf->f_cookie_check( ( ssl->conf->p_cookie ? ssl->conf->p_cookie : ssl ), ... ) ? [Edit: that might not be feasible: ssl_check_dtls_clihlo_cookie() does not have access to mbedtls_ssl_context, so passing non-NULL ssl->conf->p_cookie is the only way to get additional context there.]

gstrauss commented 2 years ago

Musing: Were there to be a new accessor available to check the cookie status, then the new accessor could be called from f_cert_cb before bind() and connect() the DTLS socket (and skip further certificate selection logic if ssl_write_hello_verify_request() is to be called later from ssl_write_server_hello()

neoxic commented 2 years ago

@gstrauss

Your proposals are neither new or unique, and suggest that you have not carefully read the prior conversations in this PR. For example, I, too, previously posted that mbedtls_ssl_states enum is public, and requested that mbedtls_ssl_handshake_step() be supported or deprecated (along with appropriate alternatives).

Please look for "X hidden items" "Load more..." on this page to review conversations that might have been hidden by the initial github PR page rendering.

Didn't know it's forbidden here to repeat what has been already said... As for me having not read the previous conversation, while I respect your opinion, brother, may I kindly ask you to refrain from far-fetched assumptions and "policing"? Thank you!