AntelopeIO / leap

C++ implementation of the Antelope protocol
Other
116 stars 70 forks source link

Review status code when http-max-bytes-in-flight-mb is reached #2129

Closed matthewdarwin closed 7 months ago

matthewdarwin commented 8 months ago

Review status code when http-max-bytes-in-flight-mb is reached. Also if the code is not 503, fix the help.

Requesting change for 5.0.1.

Copying chat from telegram:

Aaron Cox (jesta) — Greymass, [2024-01-23 11:46 A.M.] I just had a couple API nodes that entered a state where this check was repeatedly being hit:

https://github.com/AntelopeIO/leap/blob/8a9fa7689aebb0d6df276b8e4c31d3bf64800efe/plugins/http_plugin/include/eosio/http_plugin/beast_http_session.hpp#L234

Aaron Cox (jesta) — Greymass, [2024-01-23 11:47 A.M.] Traffic wasn’t super heavy, no noticeable errors in nodeos, but every API response was hitting this.

There a configuration value I can change to prevent this, or is there a bug where this is accumulating and not reseting? Not sure what happened here. I haven’t investigated that deeply yet, but anyone else experienced this?

Matt Witherspoon, [2024-01-23 11:49 A.M.] http-max-bytes-in-flight-mb is the knob to set the limit

Aaron Cox (jesta) — Greymass, [2024-01-23 12:00 P.M.] I’ll have to study these nginx logs more, but it looks like a barrage of get_table_rows calls came in and that’s when it started.

Aaron Cox (jesta) — Greymass, [2024-01-23 12:02 P.M.] And I don’t think our failover rules were expecting a 429 response from nodeos, so they didn’t reroute.

Normally nginx is the one handing out the 429 responses to clients 😂

Aaron Cox (jesta) — Greymass, [2024-01-23 6:07 P.M.] After finally getting a bit more time to look at this and find a workaround, I’m not sure I can actually failover to alternative nodeos instances when nodeos is stuck returning a 429 error code (at least not with nginx).

Should it be returning a 429? The help command seems to indicate it would have been a 503, which would have successfully knocked our failing upstreams out of rotation and kept the services running.

https://github.com/AntelopeIO/leap/blob/cf09e01336a4436aed11fe9403c0546b65b19f62/docs/03_keosd/10_usage.md?plain=1#L58-L61

Aaron Cox (jesta) — Greymass, [2024-01-23 6:09 P.M.] I’m not sure how to reproduce at this point, but since I can’t automatically failover when this occurs, I’m going to set http-max-bytes-in-flight-mb = -1 for the time being to ensure these nodes don’t end up in a deadlock without the automatic failure.

Matthew Darwin | Pinax | Nation, [2024-01-23 6:33 P.M.] HTTP 429 (too many requests) seems like wrong return code to me.

Matt Witherspoon, [2024-01-23 7:01 P.M.] yeah disappointing from 2.0 -> 3.1 that was changed from 503 to 429

bhazzard commented 8 months ago

target 3.2.6, 4.0.6, and 5.0.1