apify / apify-client-js

Apify API client for JavaScript / Node.js.
https://docs.apify.com/api/client/js
Apache License 2.0
67 stars 27 forks source link

Investigate client behaviour in a case of target pod/node restart #252

Open mtrunkat opened 2 years ago

mtrunkat commented 2 years ago

From this discussion https://apifier.slack.com/archives/C013WC26144/p1653552365035479, it seems that sometimes there is a series of network errors that lead to a suspicion that the client might be retrying the requests to the same pod although it's dead.

2022-05-16T00:38:56.894Z WARN  ApifyClient: API request failed 4 times. Max attempts: 9.
2022-05-16T00:38:56.897Z Cause:Error: aborted
2022-05-16T00:38:56.899Z     at connResetException (node:internal/errors:692:14)
2022-05-16T00:38:56.901Z     at Socket.socketCloseListener (node:_http_client:414:19)
mnmkng commented 2 years ago

I think it might be because of the keepalive connections and HTTPS tunneling. How does the client learn that the pod is down and it should retry elsewhere?

fnesveda commented 2 years ago

Note: We could test this on multistaging by starting two API pods, starting an actor which uses the API in a loop, and then we would kill one of the two pods. We could also make a testing version of the client with some more debug logging to help us figure it out.

jirimoravcik commented 2 years ago

2 pod multistaging here https://github.com/apify/apify-core/pull/6934

drobnikj commented 2 years ago

It looks like keepalive doesn't work it will not propagate through the application load balancer and the requests are distributed between pods. There is a list of pods used for each API call, I was doing get run API call from the same apify client instance every 0,5 s. Because I have just 2 pods and ALB uses a round-robin schema the pods were switched each request.

0: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
1: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
2: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
3: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
4: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
5: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
6: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
7: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
8: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
9: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
10: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
11: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
12: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
13: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
14: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
15: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
16: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
17: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
18: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
19: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
20: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
21: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
22: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
23: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
24: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
25: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
26: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
27: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
28: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
29: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
30: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
31: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
32: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
33: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
34: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"

If you restart one node it simply switches to the new one.

drobnikj commented 2 years ago

If we want to support keep-alive headers we probably need some changes on ALB or elsewhere in platform networking. Not sure if it can affect users that it is not working right now, but it probably didn't work from a time when we start using ALB, cc @dragonraid @mnmkng

drobnikj commented 2 years ago

I move this to the icebox and we can follow up once the issue appears again. It looks like some network or any other error. But hard to say two months after, we do not have any logs and the issue didn't appear in the same actor again till this report.