QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
543 stars 48 forks source link

qrexec: support logging how long each part of processing takes #9363

Open DemiMarie opened 4 months ago

DemiMarie commented 4 months ago

How to file a helpful issue

The problem you're addressing (if any)

If one wants to make qrexec faster, one needs to know what parts of qrexec are the bottleneck. Right now, this isn’t easy unless one patches the code. Yes, it’s Python and not hard to modify, but it would be nice to not need to do that.

The solution you'd like

Provide a flag (off by default) that spews performance information to the system log.

The value to a user, and who that user might be

People who want to make qrexec faster will have an easier time getting real data and not file issues like #6266 or #9361 that do not address the actual problems.

Completion criteria checklist

(This section is for developer use only. Please do not modify it.)

3hhh commented 4 months ago

Related: #7075

I'd also be interested in this.

However at least on my system 40% (2,7s of 6.6s, kernel & Xen time excluded) of the qrexec wait time comes from systemd attempting to reach network.target. This was different back in 4.0 when qrexec didn't need to wait for the network stack as it didn't use sockets.

marmarek commented 4 months ago

However at least on my system 40% (2,7s of 6.6s, kernel & Xen time excluded) of the qrexec wait time comes from systemd attempting to reach network.target

This sounds like you talk about system startup time. This is not what this issue is about. This one is about measuring individual stages of individual qrexec calls.

This was different back in 4.0 when qrexec didn't need to wait for the network stack as it didn't use sockets.

I don't think it's related to recently added socket service feature, there was no change in service dependencies.

3hhh commented 4 months ago

On 7/21/24 19:58, Marek Marczykowski-Górecki wrote:

However at least on my system 40% (2,7s of 6.6s, kernel & Xen time excluded) of the qrexec wait time comes from systemd attempting to reach network.target

This sounds like you talk about system startup time. This is not what this issue is about. This one is about measuring individual stages of individual qrexec calls.

Yes, I'm aware of that. I was merely trying to make a point about the limitations of this issue.

This was different back in 4.0 when qrexec didn't need to wait for the network stack as it didn't use sockets.

I don't think it's related to recently added socket service feature, there was no change in service dependencies.

If you still happen to have a 4.0 install available, compare the output of systemd-analyze critical-chain qubes-qrexec-agent.service with that of 4.2 on a comparable VM with similar memory size etc. Unfortunately I didn't find it in my historical data, so you may be correct after all.

alimirjamali commented 3 weeks ago

I believe this is a major issue. Comparing run-time of qvm-ls in dom0 with AppVM (with relevant policy rules):

dom0:

real    0m0.431s
user    0m0.147s
sys     0m0.056s

AppVM:

real    0m2.801s
user    0m0.212s
sys     0m0.220s

And I have absolutely no idea what is taking time here. Is it policy validation? Is it something else? Without this information, I do not know how I could help to improve it.

DemiMarie commented 3 weeks ago

9362 (caching of system info structure) would be a good place to start. @marmarek already did the measurements, so it is known to help.