AleoNet / snarkOS

A Decentralized Operating System for ZK Applications
http://snarkos.org
Apache License 2.0
4.24k stars 2.59k forks source link

[Perf] Include json serialisation in the `get_blocks` blocking task #3253

Closed niklaslong closed 2 months ago

niklaslong commented 2 months ago

A small follow-up to #3252. Devnet testing of the endpoint revealed serialisation times in the milliseconds (usually hundreds), motivating the change in this PR.

I managed to OOM some of the nodes with this change, the root cause is still tbd but I suspect it's also present in #3252. This is likely because previously blocked workers are no longer artificially throttling execution. Multiple concurrent requests for MBs of data could then lead to issues despite the existing rate limiting.

Drafted until we can figure out why.

niklaslong commented 2 months ago

Opened for review as the node crash was due to devnet running with --rest-rps 90000000 effectively allowing huge "bursts" of requests disabling rate limiting and is unrelated to this change. This feels like stating the obvious but the disabled rate limiting in combination with large response data (in the 10s of MB) and full resource utilisation via rayon is a potent DoS vector. Node operators should not be disabling rate limiting (and ideally should be enforcing it via proxies at the infra level).

niklaslong commented 2 months ago

+1 on the testing for establishing robust rate limits but I think that tradeoff should be considered separately from this change. Blocking an async worker with hundreds of milliseconds worth of serialisation would impact the rest of the node's resource allocation and execution.