I have upgraded my distribution node to use OpenTelemetry work by Zeeshan: https://github.com/Joystream/joystream/pull/4779. I have been running that for over a week now and has collected some data. This is latency distribution of my local QN responses:
As you can see, while most requests will finish faster than 80ms, there's a big chunk of outlier that require over 10s to finish. As you can see on the graph, 95th percentile is over 16s, which is unacceptably high. This can be an explanation of why we sometimes see videos taking ages to load, it seems QN can be the bottleneck, at least for some of them.
I would propose to nuke QN and just do the switch to Subsquid. I don't think it would be that much work since subsquid mappings for storage are already done in Orion. Then we would be using a maintained framework that regularly receives updates, does initial sync much faster and is generally more reliable
I have upgraded my distribution node to use OpenTelemetry work by Zeeshan: https://github.com/Joystream/joystream/pull/4779. I have been running that for over a week now and has collected some data. This is latency distribution of my local QN responses:
As you can see, while most requests will finish faster than 80ms, there's a big chunk of outlier that require over 10s to finish. As you can see on the graph, 95th percentile is over 16s, which is unacceptably high. This can be an explanation of why we sometimes see videos taking ages to load, it seems QN can be the bottleneck, at least for some of them.
I would propose to nuke QN and just do the switch to Subsquid. I don't think it would be that much work since subsquid mappings for storage are already done in Orion. Then we would be using a maintained framework that regularly receives updates, does initial sync much faster and is generally more reliable