Open pwaller opened 5 years ago
Another important measurement:
root@8f54deb133ef:/mnt/examples/histogram# time /opt/sdaccel-builder/sdaccel-builder image
[snip]
real 126m30.430s
user 216m2.451s
sys 91m6.848s
i.e. In order to build the histogram example, we're spending 91 CPU-minutes on sys, when our wall clock is 126 minutes. It might go a lot faster if we could reduce the sys overhead...
Minor note, this stack trace shows up in an example trace given in the linux docs, so it perhaps isn't as uncommon as I thought:
Oh, I think I'm totally misinterpreting the stack trace and now I think I know what's going on. proc_single_show
is actually the procfs implementation, i.e. these are actually read() syscalls occurring on procfs... https://github.com/torvalds/linux/blob/7566ec393f4161572ba6f11ad5171fd5d59b0fbd/fs/proc/base.c#L725-L753
Yikes. This seems really bananas. Unfortunately perf doesn't seem to be capturing the stacks of the read syscalls. But this still smells quite bad.
By the way, I left a second run going with seccomp disabled on the same machine. Here is the timing. It's only a 5% wall clock effect and the sys CPU% was even higher (!?). So I'm fairly perplexed.
real 119m13.907s
user 284m32.525s
sys 141m2.333s
Definitely seems wrong that so much CPU% should be spent inspecting the proc filesystem, though.
Take this perf analysis, for example, recorded while an synthesis is happening with vivado:
It shows 34% of cycles spent in
sys_read
(i.e. theread
syscall), and of those we're spending tons of time inproc_pid_status
. This makes no sense. Why would we spend all our CPU budget interrogating the pid status and time inseq_printf
?According to my mental model, if this measurement is correct, we're wasting a lot of time on the kernel when we could be using it for synthesis. The obvious culpret would be seccomp, which effectively runs a Berkeley Packet Filter on every syscall invocation. However, the above measurement was taken in a docker container which was invoked with
--security-opt seccomp=unconfined
, so that theory is questionable. I'd love to know why proc_single_show is appearing in our stack traces.