aarongable / draft-acme-ari

Internet Draft for the Automated Certificate Management Environment (ACME) Renewal Information (ARI) Extension
Other
3 stars 7 forks source link

Return HTTP 409 "Conflict" when the certificate identified by 'replaces' has already been replaced #56

Open beautifulentropy opened 4 months ago

beautifulentropy commented 4 months ago

The HTTP 409 Conflict response status code indicates a request conflict with the current state of the target resource.

Even perfect record-keeping on the part of the client can be undermined by unexpected data and/or power loss. Providing an error code which indicates that the client should retry the renewal without the 'replaces' field would be helpful.

Credit to @jsha for this idea.

aarongable commented 4 months ago

I like this idea. I think it'll take the form of massaging this paragraph:

Servers SHOULD check that the identified certificate and the current New Order request correspond to the same ACME Account and share identifiers, and that the identified certificate has not already been marked as replaced by a different finalized Order. Servers MAY ignore the replaces field in New Order requests which do not pass such checks.

I think that middle clause should be broken into a separate sentence that says that already-replaced certificates SHOULD result in an HTTP 409. We should still say that other forms of invalid replaces fields (e.g. garbage bytes, certs issued by other CAs, certs issued by this CA but to a different account) can be handled however, per server policy.

mholt commented 4 months ago

Would getting a 409 just imply retrying without the Replaces field? I'm trying to figure out how to reconcile the clients record with the servers. How does the client know what cert it was replaced with already?

beautifulentropy commented 4 months ago

Would getting a 409 just imply retrying without the Replaces field? I'm trying to figure out how to reconcile the clients record with the servers. How does the client know what cert it was replaced with already?

The client should drop the 'replaces' field value and retry again. If a 409 is received that's the server indicating that this same ACME account has already completed another finalized order which indicated it was a replacement for this certificate.

mholt commented 4 months ago

Thanks. One last question, if two orders are concurrent with the same replaces value, does a 409 still get returned in that case?

beautifulentropy commented 4 months ago

Thanks. One last question, if two orders are concurrent with the same replaces value, does a 409 still get returned in that case?

Thanks for asking for this clarification. I'll have to amend what we've said above:

The CA should return a 409 when an existing replacement order is already Finalized, Pending, Ready, or Processing. If that order becomes Invalid, another replacement order can be made.

robstradling commented 4 months ago

Is indicating "that the client should retry the renewal without the 'replaces' field" the only use case for an HTTP 409 response? If so, then I'm struggling to see why an HTTP 409 response would be more helpful than the -03 behaviour "Servers MAY ignore the replaces field in New Order requests which do not pass such checks".

If a Server ignores the replaces field, then the Order object returned by the newOrder endpoint will omit the replaces field. Isn't that enough of a signal to the Client that the Server has not treated the new order as a renewal? (The -03 behaviour also avoids the Client having to retry the request, which seems preferable, all other things being equal).

Or, are we envisaging that there might be reasons for a Client to choose to not retry the renewal if an HTTP 409 is received? If so, what reasons?

beautifulentropy commented 4 months ago

Is indicating "that the client should retry the renewal without the 'replaces' field" the only use case for an HTTP 409 response? If so, then I'm struggling to see why an HTTP 409 response would be more helpful than the -03 behaviour "Servers MAY ignore the replaces field in New Order requests which do not pass such checks".

If a Server ignores the replaces field, then the Order object returned by the newOrder endpoint will omit the replaces field. Isn't that enough of a signal to the Client that the Server has not treated the new order as a renewal? (The -03 behaviour also avoids the Client having to retry the request, which seems preferable, all other things being equal).

Or, are we envisaging that there might be reasons for a Client to choose to not retry the renewal if an HTTP 409 is received? If so, what reasons?

The sole use case for an HTTP 409 response is to handle situations where there's already a replacement order made by the same ACME account. This scenario typically suggests issues such as poor record-keeping by the client or potential data loss. Rejecting the request is a clear signal to the operator (and client author) that something is amiss.

mholt commented 4 months ago

Rejecting the request is a clear signal to the operator (and client author) that something is amiss.

That's all well and good, but I am left with the question as to what to do with this information (as a client). All we know is we need a cert. Server already has a record of it being replaced? Cool. We still need a cert. IMO "replacing" a cert should be idempotent, unless there's some obvious course of action I'm not considering... sure, we can retry without the "replaces" field, but, why? Like, I won't really be able to do anything about it on my end will I?

beautifulentropy commented 4 months ago

That's all well and good, but I am left with the question as to what to do with this information (as a client). All we know is we need a cert. Server already has a record of it being replaced? Cool. We still need a cert. IMO "replacing" a cert should be idempotent, unless there's some obvious course of action I'm not considering... sure, we can retry without the "replaces" field, but, why? Like, I won't really be able to do anything about it on my end will I?

Thank you for sharing your concerns. You're correct in noting that even if the server has a record of the certificate being replaced, as a client, you still require a valid certificate.

This update is primarily aimed at clients filing numerous concurrent certificate replacements due to flaws in their implementation. By notifying these clients when a replacement order has already been made, it serves as an alert to them that there might be an issue with their implementation. This is particularly relevant considering the rate limit exemptions that are granted for ARI-triggered orders.

If you're seeing these replacement notifications as a one-off, it might not significantly impact you. However, for clients experiencing frequent 409s, it's a crucial signal indicating that they're not fully performing proper record-keeping and thus are not benefitting from the rate limit exemptions. Simply removing the 'replaces' field and proceeding without acknowledging it could lead to incorrect implementations which are never fully identified.

robstradling commented 4 months ago

Rejecting the request is a clear signal to the operator (and client author) that something is amiss.

@beautifulentropy Are you implying that it would be a meaningfully less clear signal for the Server to instead accept the newOrder request and omit replaces in the returned Order object? If so, please could you explain your thinking on this?

beautifulentropy commented 4 months ago

Rejecting the request is a clear signal to the operator (and client author) that something is amiss.

@beautifulentropy Are you implying that it would be a meaningfully less clear signal for the Server to instead accept the newOrder request and omit replaces in the returned Order object? If so, please could you explain your thinking on this?

I'm considering the scenario where the server strips the replaces field and returns it empty in the order object. In such cases, it might be challenging for clients to discern whether the server is actively processing and correctly omitting the replaces field as per the ARI specification, or if it's simply ignoring the field due to a faulty implementation.

Consider the error responses we currently have as well. We respond unauthorized (401) when the new order comes from an account that didn't request the certificate being replaced. Similarly, we respond malformed (400) when the new order contains no matching identifiers. Extending this logic, it would be consistent to respond with conflict (409) when the ACME account in question has already made a new order that replaces this certificate, no?

robstradling commented 4 months ago

The HTTP 409 Conflict response status code indicates a request conflict with the current state of the target resource.

OK, I can accept that "has already made a new order that replaces this certificate" represents a "request conflict with the current state of the target resource". Due to this conflict, the replaces field in the request cannot be accepted and included in the "target resource" (i.e., the new Order object) by the server.

But isn't this equally true for every reason a server might have for not accepting the replaces field in the request? (e.g., if the server determines that these conditions are not met: "the identified certificate and the current New Order request correspond to the same ACME Account and share identifiers").

ISTM that the nondeterminism of sometimes returning 409 and sometimes ignoring the replaces field in the request could lead to confusion. So if we're going the 409 route, then I think I would prefer to specify a requirement along the lines of "If the request contains a 'replaces' value that is unacceptable to the server, then the server MUST return HTTP 409 (Conflict) with a problem document that explains the reason for rejecting it", together with some new error types (suggested names: alreadyReplaced, wrongAccount, and identifiersNotShared).

aarongable commented 4 months ago

But isn't this equally true for every reason a server might have for not accepting the replaces field in the request?

In my opinion, it's not equivalent. The distinction in my mind is as follows:

The former can be handled with a malformed error and a 400 response code. However, the latter are truly well-formed, but a conflict has arisen -- the previous certificate has already been updated to point at a new order that replaces it -- and so a 409 response code (and maybe a new acme error type?) is appropriate.

In my perception, the semantics of the HTTP 409 Conflict status code are truly limited to essentially race conditions: that response code is only appropriate if you're expecting a resource to be in a particular state, but it isn't in that state when you go to update it. That's only satisfied by the "we would be happy to update the previous certificate, but we can't because it's already replaced" condition, so I don't think it's appropriate to use 409 for the other error conditions.

robstradling commented 4 months ago
  • Some "replaces" fields the server will not process, due to server policy -- the identified certificate was issued to the wrong account, or with a non-overlapping set of names, or any other reason that the server determines.
  • Some "replaces" fields the server would process, because they meet all the criteria... but it cannot, because the identified certificate has already been replaced.

I disagree that a server "cannot" in that second scenario. Choosing to reject a replaces value because the identified certificate has already been replaced is also down to server policy ("Servers MAY ignore the replaces field in New Order requests which do not pass such checks"; note that "MAY" does not dictate required behaviour). But having said that...

The former can be handled with a malformed error and a 400 response code

If we're in agreement that servers MUST always return an error when rejecting a submitted replaces value, then I don't object to using 409 just for the "already renewed" case and 400 for other error conditions. I'll let you paint the bikeshed. ;-)