cognitedata / cdf-sdk-java

Java SDK for Cognite Data Fusion
Apache License 2.0
4 stars 8 forks source link

Don't retry when response status code is 404. #348

Closed dsorenes closed 1 year ago

dsorenes commented 1 year ago

Currently when we get a 404 from RAW we retry the request. This is wrong behaviour as when we get a 404 from RAW it means the key/row does not exist, and thus we should not retry.

When returned status code is 404 it means the key does not exist. Don't retry when that is the case.

I don't know who to send this PR too as the main maintainer is not in Cognite anymore.

dsorenes commented 1 year ago

Why would 404 raise the exception? Could that be fixed in the way that 404 doesn't raise the exception at all?

Yeah in that sense it's not 404 in itself, but rather that we get an IOException, and retry when that happens. I could dig a bit deeper into why it's an IOException getting thrown when we get a 404, but that might be OkHttp stuff. I can investigate a bit more to get a better understanding.

dsorenes commented 1 year ago

Why would 404 raise the exception? Could that be fixed in the way that 404 doesn't raise the exception at all?

Yeah in that sense it's not 404 in itself, but rather that we get an IOException, and retry when that happens. I could dig a bit deeper into why it's an IOException getting thrown when we get a 404, but that might be OkHttp stuff. I can investigate a bit more to get a better understanding.

Though since this is an Executor for most requests, and an IOException can occur on other type of requests (not related to RAW and key not found) because of lack of resources or bad network connectivity. So we want to retry on IOException, but not if the status code returned is also 404.

The docs for OkHttp states this for the execute method:

   * Note that transport-layer success (receiving a HTTP response code, headers and body) does not
   * necessarily indicate application-layer success: `response` may still indicate an unhappy HTTP
   * response code like 404 or 500.
   *
   * @throws IOException if the request could not be executed due to cancellation, a connectivity
   *     problem or timeout. Because networks can fail during an exchange, it is possible that the
   *     remote server accepted the request before the failure.
   * @throws IllegalStateException when the call has already been executed.

This is weird as we do get a response back. No cancellation or network problem happens, but we still get the following message:

2023-11-15 10:18:18,734 [pool-1-thread-10] WARN  c.c.c.s.e.AutoValue_RequestExecutor - RequestExecutor [XcOs1] -Transient error when receiving response from Fusion (request id: ddbe4347-2773-9fb7-a904-fcb232714c6f, response code: 404). Retrying...
java.io.IOException: Response from Fusion: Unexpected response code: 404. Response{protocol=h2, code=404, message=, url=https://api.cognitedata.com/api/v1/projects/akerbp/raw/dbs/lci-system/tables/aveva-schemas-compact/rows/NOZTI_P0438%2FNOZTI_P0409-LCIS}
Response body: {
  "error": {
    "code": 404,
    "message": "Did not find any rows with that key"
  }
}
Response headers: server: envoy
date: Wed, 15 Nov 2023 09:18:18 GMT
content-type: application/json
x-request-id: ddbe4347-2773-9fb7-a904-fcb232714c6f
x-envoy-upstream-service-time: 27
strict-transport-security: max-age=31536000

        at com.cognite.client.servicesV1.executor.RequestExecutor.executeRequest(RequestExecutor.java:221)
        at com.cognite.client.servicesV1.executor.RequestExecutor.lambda$executeRequestAsync$0(RequestExecutor.java:169)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)

🤷

pavelzubarev commented 1 year ago

Is it possible to retrieve the stack trace for it so we know where exactly it was raised?

dsorenes commented 1 year ago

Is it possible to retrieve the stack trace for it so we know where exactly it was raised?

Yeah I'll do a run locally. I'll be back in 30 minutes.

dsorenes commented 1 year ago

Alright so this might be a better solution. For single requests we make 404 a valid response code.

2023-11-15 14:44:14,239 [com.cognite.json.JsonSchemaValidator.main()] INFO  com.cognite.client.RawRows - retrieve() - 1dFSf - Received 1 rows to retrieve.
2023-11-15 14:44:14,305 [com.cognite.json.JsonSchemaValidator.main()] ERROR com.cognite.client.RawRows - retrieve() - 1dFSf - Retrieve row request failed: {
  "error": {
    "code": 404,
    "message": "Did not find any rows with that key"
  }
}
2023-11-15 14:44:14,305 [com.cognite.json.JsonSchemaValidator.main()] ERROR c.cognite.json.JsonSchemaValidator - Could not find schema with key: NOZTI_P0438/NOZTI_P0409-LCIS
2023-11-15 14:44:14,306 [com.cognite.json.JsonSchemaValidator.main()] INFO  c.cognite.json.JsonSchemaValidator - Schema not found for class NOZTI_P0438/NOZTI_P0409-LCIS. Sending it to deadletter.

We finally catch the exception in our client's code. Might want to throw a different kind of error message when we get a 404 from RawRow's retrieve as that is a bit more special. But considering we can retrieve multiple items at the same time I'm not sure if that will help (or accounted for).

dsorenes commented 1 year ago

Sanity check question: Have you checked all the places? (just to be on the safe side of things)

I have not, but I will 👍