Closed jbearer closed 4 months ago
So from looking at the code, we aren't hogging the lock but we are taking a lock on storage for a lot of little operations: once per chunk, and once exclusively for each object we insert. So it's quite possible that a long queue of writers forms, starving out readers.
This would explain why we see high CPU on every major scan, but only notice request timeouts when a node is syncing: every scan does a lot of reading, but only if we have a lot of data to sync do we do much writing
I think I will close this, I'm not overly concerned about API performance during the extremely rare and short-lived scenario of a node syncing a lot of data. Of course we could always benefit from making syncing faster overall, but that's a different issue
During a major scan which is fetching a lot of missing data, CPU usage gets very high and API requests get very slow. The high CPU usage sort of makes sense, but it doesn't make sense that API requests would take multiple seconds to respond, since we shouldn't be hoarding any locks during these scans.