filecoin-saturn / caboose

A blockstore for distributing load
Other
12 stars 2 forks source link

Better downvoting and cool down fetches #59

Closed aarshkshah1992 closed 1 year ago

aarshkshah1992 commented 1 year ago

We need to downvote votes more gradually so that they get time to recover from temporary failures, Lassie timeouts, from being overloaded etc. but without them getting in the way of fetch requests However, once we remove a node from the pool post the gradual downvoting, we should make the node earn it's reputation back before we start sending more requests to it. To that end, this PR:

TODO

CURRENT Results from apache backend testing

 aarshshah@Aarshs-MacBook-Pro-2 caboose % ab -k -l -n 10000 -c 1000 -w "http://localhost:8081/ipns/en.wikipedia-on-ipfs.org/wiki/"
<p>
 This is ApacheBench, Version 2.3 <i>&lt;$Revision: 1879490 $&gt;</i><br>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/<br>
 Licensed to The Apache Software Foundation, http://www.apache.org/<br>
</p>
<p>
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests

<table >
<tr ><th colspan=2 bgcolor=white>Server Software:</th><td colspan=2 bgcolor=white></td></tr>
<tr ><th colspan=2 bgcolor=white>Server Hostname:</th><td colspan=2 bgcolor=white>localhost</td></tr>
<tr ><th colspan=2 bgcolor=white>Server Port:</th><td colspan=2 bgcolor=white>8081</td></tr>
<tr ><th colspan=2 bgcolor=white>Document Path:</th><td colspan=2 bgcolor=white>/ipns/en.wikipedia-on-ipfs.org/wiki/</td></tr>
<tr ><th colspan=2 bgcolor=white>Document Length:</th><td colspan=2 bgcolor=white>Variable</td></tr>
<tr ><th colspan=2 bgcolor=white>Concurrency Level:</th><td colspan=2 bgcolor=white>1000</td></tr>
<tr ><th colspan=2 bgcolor=white>Time taken for tests:</th><td colspan=2 bgcolor=white>22.658 seconds</td></tr>
<tr ><th colspan=2 bgcolor=white>Complete requests:</th><td colspan=2 bgcolor=white>10000</td></tr>
<tr ><th colspan=2 bgcolor=white>Failed requests:</th><td colspan=2 bgcolor=white>0</td></tr>
<tr ><th colspan=2 bgcolor=white>Non-2xx responses:</th><td colspan=2 bgcolor=white>10000</td></tr>
<tr ><th colspan=2 bgcolor=white>Keep-Alive requests:</th><td colspan=2 bgcolor=white>10000</td></tr>
<tr ><th colspan=2 bgcolor=white>Total transferred:</th><td colspan=2 bgcolor=white>4897955 bytes</td></tr>
<tr ><th colspan=2 bgcolor=white>HTML transferred:</th><td colspan=2 bgcolor=white>2008065 bytes</td></tr>
<tr ><th colspan=2 bgcolor=white>Requests per second:</th><td colspan=2 bgcolor=white>441.34</td></tr>
<tr ><th colspan=2 bgcolor=white>Transfer rate:</th><td colspan=2 bgcolor=white>211.10 kb/s received</td></tr>
<tr ><th bgcolor=white colspan=4>Connection Times (ms)</th></tr>
<tr ><th bgcolor=white>&nbsp;</th> <th bgcolor=white>min</th>   <th bgcolor=white>avg</th>   <th bgcolor=white>max</th></tr>
<tr ><th bgcolor=white>Connect:</th><td bgcolor=white>    0</td><td bgcolor=white>    3</td><td bgcolor=white>   44</td></tr>
<tr ><th bgcolor=white>Processing:</th><td bgcolor=white>  393</td><td bgcolor=white> 1189</td><td bgcolor=white>10092</td></tr>
<tr ><th bgcolor=white>Total:</th><td bgcolor=white>  393</td><td bgcolor=white> 1192</td><td bgcolor=white>10136</td></tr>
</table>
willscott commented 1 year ago

one note from your AB output is it completed in around 5 minutes. that may not be enough to get to steady state - with membership debounce at 5 min, you're only going to see 1 round of downvoting happen at all, and won't have reached a point where nodes would actually be excluded.

aarshkshah1992 commented 1 year ago

@willscott Fixed the AB output.

lidel commented 1 year ago

@aarshkshah1992 deployed c9035dd from this PR to staging, you can observe things after this timestamp (UTC):

2023/03/01 17:20:00 Starting bifrost-gateway 2023-03-01-adb95e1
DiegoRBaquero commented 1 year ago

I don't know where the change to interval is, but orchestrator is getting hit multiple times per second per IP


147.75.71.197 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4700 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.59.113 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
145.40.87.133 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4975 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.48.241 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4850 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
188.240.57.10 - - [01/Mar/2023:22:37:23 +0000] "POST /register?ssl=done HTTP/1.1" 200 753 "-" "Saturn/648_5207745"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.59.113 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
86.109.14.251 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 5013 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.75.71.197 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4700 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.75.84.243 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4784 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.59.113 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
142.202.255.19 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/local HTTP/1.1" 200 246 "-" "Saturn/648_5207745"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.59.113 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.48.241 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4850 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.75.71.197 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4700 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.59.113 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
145.40.65.177 - - [01/Mar/2023:22:37:23 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
5.44.249.154 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/local HTTP/1.1" 200 437 "-" "Saturn/648_5207745"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
188.244.117.127 - - [01/Mar/2023:22:37:24 +0000] "POST /register?ssl=done HTTP/1.1" 200 737 "-" "Saturn/648_5207745"
145.40.65.177 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.75.84.243 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4784 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.59.113 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.59.113 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
145.40.65.177 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4745 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.75.71.197 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4700 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
147.28.129.15 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4751 "-" "bifrost-gateway/2023-03-01-adb95e1"
136.144.48.241 - - [01/Mar/2023:22:37:24 +0000] "GET /nodes/nearby?count=1000 HTTP/1.1" 200 4850 "-" "bifrost-gateway/2023-03-01-adb95e1"
aarshkshah1992 commented 1 year ago

@DiegoRBaquero Yeah, that was a bug in this PR. Have pushed a fix.

lidel commented 1 year ago

https://github.com/ipfs/bifrost-gateway/commit/4cbc3a75de06f487190982dcb67c2f5ac124b81f with caboose from this PR (https://github.com/filecoin-saturn/caboose/pull/59/commits/a518e617b034cb3aa497d152e171693ea0ceb7fc) deployed to staging box;

2023/03/02 15:49:36 Starting bifrost-gateway 2023-03-02-4cbc3a7