ampproject / amp-toolbox-php

AMP Optimizer PHP library
Apache License 2.0
73 stars 25 forks source link

Set CURLOPT_FAILONERROR to get the correct curl error code #534

Closed dritter closed 2 years ago

dritter commented 2 years ago

Otherwise Client Errors (>= 400) return curl error code 0. In case of 404 errors (CURLE_HTTP_NOT_FOUND) it would return 0, instead of 22. See CURLOPT_FAILONERROR.

Before this change, 404 errors won't be retried, because in_array($curlErrno, self::RETRYABLE_ERROR_CODES, true) === false evaluated to true, hence throwing an exception.

This is one puzzle piece from #530

Btw. to make it clear: It is still a mystery why we sometimes receive 404s from https://cdn.ampproject.org/rtv/metadata. This must be something on the server side.. After this PR we still get 404s, but these are hidden.

dritter commented 2 years ago

The failing tests seem unrelated to my change.

westonruter commented 2 years ago

Before this change, 404 errors won't be retried, because in_array($curlErrno, self::RETRYABLE_ERROR_CODES, true) === false evaluated to true, hence throwing an exception.

Talking myself through the code here...

If the HTTP status code is 404 and the $curlErrno is 22 due to CURLOPT_FAILONERROR this will cause the request to be retried:

https://github.com/ampproject/amp-toolbox-php/blob/f626d8057f582ea46ebea8ac14b524e4eb6747e4/src/RemoteRequest/CurlRemoteGetRequest.php#L136-L137

Since a 404 is among the retryable error codes:

https://github.com/ampproject/amp-toolbox-php/blob/f626d8057f582ea46ebea8ac14b524e4eb6747e4/src/RemoteRequest/CurlRemoteGetRequest.php#L36-L39

So clearly the code seems to indicate that 404 should be retried, but it's clear based on your research that currently 404 is not getting retried.

But we don't want 404 errors to be retried normally? If a file is missing then retrying it won't normally result in a success. It seems like your change is correct here, but that we should also be removing CURLE_HTTP_NOT_FOUND from RETRYABLE_ERROR_CODES.

westonruter commented 2 years ago

The failing tests seem unrelated to my change.

I've fixed this in #533 via https://github.com/ampproject/amp-toolbox-php/pull/533/commits/5f3ecf06118a1be90bbe1428c7f985d701c79d66. You can merge the latest from main or rebase to address that.

dritter commented 2 years ago

But we don't want 404 errors to be retried normally? If a file is missing then retrying it won't normally result in a success.

I agree, but we still get 404s (See https://github.com/ampproject/amp-toolbox-php/issues/530#issuecomment-1248128474). I don't know why or how, but if we retry these 404s, we get a successful response (no 404). Since we switched over to this branch in production, we had no 404 at all.

It seems like your change is correct here, but that we should also be removing CURLE_HTTP_NOT_FOUND from RETRYABLE_ERROR_CODES.

Will do.

westonruter commented 2 years ago

Now that CURLE_HTTP_NOT_FOUND is removed from RETRYABLE_ERROR_CODES does this mean you'll start getting 404s without retry?

dritter commented 2 years ago

I haven't tried it, but I guess yes: Before this PR, the curl error codes didn't match, which led to 404s not being retried and eventually to an Exception. After this PR the CURLE_HTTP_NOT_FOUND is not in the list of retryable error codes, which leads to 404s not being retried and eventually to an Exception. So different approaches, but the outcome is the same.

The more interesting question is: why does retrying fix the 404? ;)

westonruter commented 2 years ago

So different approaches, but the outcome is the same.

With the new benefit being that a 404 results in an immediate exception without retrying in vain (normally).

dritter commented 2 years ago

I agree. Sorry for the late response, lost track of this PR.