Open sjakobi opened 5 years ago
Here are some syscall statistics:
$ strace -c -w dhall type --file stable/jenkins/index.dhall --quiet
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.44 1277.662792 3936 324641 93352 select
0.40 5.304978 5 1008713 read
0.37 4.909465 5 1015626 fstat
0.37 4.797701 895 5358 498 poll
0.32 4.180177 8 546307 21477 stat
0.27 3.550268 7 512881 open
0.21 2.703619 5 517960 close
0.18 2.401575 5 514497 511256 ioctl
0.14 1.896009 6 343727 rt_sigprocmask
0.11 1.464082 14 106277 94401 rt_sigreturn
0.07 0.933065 5 171865 clock_gettime
0.04 0.529798 81 6532 getdents
0.02 0.303407 10 30999 4950 recvfrom
0.01 0.137467 12 11340 sendto
0.01 0.121909 9 13734 15 access
0.01 0.072200 33 2218 timer_settime
0.00 0.043991 22 2033 write
0.00 0.043851 9 5079 socket
0.00 0.043813 13 3465 1638 connect
0.00 0.040473 25 1620 getsockopt
0.00 0.038667 24 1620 sendmmsg
0.00 0.038470 19 2033 ftruncate
0.00 0.033649 7 4860 recvmsg
0.00 0.013780 20 685 madvise
0.00 0.010869 7 1620 bind
0.00 0.009235 5 1824 getsockname
0.00 0.005717 18 321 mmap
0.00 0.000584 6 100 setsockopt
0.00 0.000210 8 28 mprotect
0.00 0.000106 106 1 execve
0.00 0.000087 4 20 lseek
0.00 0.000085 5 18 getcwd
0.00 0.000077 15 5 brk
0.00 0.000061 4 14 rt_sigaction
0.00 0.000059 15 4 munmap
0.00 0.000059 30 2 mkdir
0.00 0.000013 7 2 chmod
0.00 0.000011 6 2 futex
0.00 0.000009 9 1 timer_delete
0.00 0.000008 4 2 getrlimit
0.00 0.000006 6 1 sysinfo
0.00 0.000006 6 1 timer_create
0.00 0.000005 5 1 set_tid_address
0.00 0.000004 4 1 arch_prctl
0.00 0.000004 4 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 1311.292421 5158039 727587 total
That the vast majority of the time is spent in select
seems to indicate that the program is waiting for (network) input.
I'd like to figure out how the time spent waiting for requests to complete is distributed, and which requests take the longest time. Are there good tools to help me with that?
@sjakobi: You can set the https_proxy
environment variable to forward all Dhall traffic through a locally running forward proxy that logs network requests
You can set the
https_proxy
environment variable to forward all Dhall traffic through a locally running forward proxy that logs network requests
Cheers, I can try that! Any suggestions for a forward proxy for Linux?
@sjakobi: The two most common ones that I know of are squid
and varnish
but it's also not that hard to write a custom one using Haskell (e.g. using the http-proxy
package or something similar)
I ended up installing mitmproxy
, and using its browser GUI mitmweb
.
I then ran dhall
with
$ env https_proxy=127.0.0.1:8080 SYSTEM_CERTIFICATE_PATH=/home/simon/.mitmproxy dhall type --file stable/jenkins/index.dhall --quiet
I noticed
Requests to https://prelude.dhall-lang.org all take 800ms to 1s, which I find pretty slow – but since there are only ~120 files, we're done after < 2 minutes.
Requests for https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/* mostly take 250 - 400 ms, but every 5-20 requests there's one that takes 1s and sometimes 3s. This makes me wonder whether there's some kind of throttling at play.
In a last run with the mitmdump
CLI tool, the process ended like this:
$ env https_proxy=127.0.0.1:8080 SYSTEM_CERTIFICATE_PATH=/home/simon/.mitmproxy dhall type --file stable/jenkins/index.dhall --quiet
dhall:
Error: The remote host took too long to respond
URL: https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/4ef46543ceff0e10978cca542952c56ee6e47b93/types/io.k8s.api.core.v1.NamespaceStatus.dhall
Note the large number of clientdisconnect
s at the end – I wonder whether that's some kind of resource leak that causes the server to block me.
While looking at the strace
output I also noticed that we seem to re-read the SSL certificates on each request – it doesn't take long in total, but it might be worth optimizing anyways.
we seem to re-read the SSL certificates on each request
That's probably because we construct a new Manager
for every request:
I seem to remember that we didn't always do that, and I wonder whether it impairs our ability to handle HTTP connections…
@sjakobi: Yeah, we should only be creating a Manager
once. If we're doing so for every request that's a bug.
Regarding the prelude.dhall-lang.org
slowdown, it's because prelude.dhall-lang.org
makes a second request to GitHub to serve the file because it's not caching requests. You can find the nginx
configuration we currently use here:
I can enable simple HTTP caching for prelude.dhall-lang.org
for all requests for individual version-pinned Prelude files.
After #1522:
$ env XDG_CACHE_HOME=.cache time dhall type --file stable/jenkins/index.dhall --quiet
4.15user 1.85system 4:53.92elapsed 2%CPU (0avgtext+0avgdata 233076maxresident)k
176inputs+48368outputs (0major+53820minor)pagefaults 0swaps
5x faster for me. Thanks @Gabriel439! :)
I'll try to figure out what else we could do to speed that up.
@sjakobi: I still need to enable server-side caching for the Prelude, although I'm not sure how much of an effect that will have
So I took a stab at nginx
server-side caching and was not successful, so I'm going to put it down for now unless we have evidence that this would significantly improve cold startup speeds. If somebody else wants to try to pick this up I pushed a branch with the nginx
configuration that I tried:
https://github.com/dhall-lang/dhall-lang/compare/gabriel/server_side_caching
Are the network requests parallelized? Every nth one taking 1–3s shouldn’t matter if we do 10–50 at once. Maybe there’s a way to take advantage of HTTP/2 here?
@Profpatsch: They are currently not parallelized, but I believe they can be parallelized with some work
I forgot to mention here that one idea I have to address this is to create a proxy.dhall-lang.org
that interpreters can optionally use to as a forward proxy to download pre-resolved (and possibly pre-normalized) Dhall expressions. See: https://discourse.dhall-lang.org/t/rfc-proxy-dhall-lang-org/144
Hi there! I'm a long time lurker but inexperienced user 👋 Just popping in to leave my 2 cents of feedback.
I was trying to evaluate a Dhall snippet from Gabriel's slides, on a fresh install of the latest dhall-haskell
(1.37.1) (so nothing in cache), and it made for a confusing new user experience.
Downloading https://prelude.dhall-lang.org/v20.0.0/package.dhall and all its dependencies took several minutes, and dhall
provided no feedback on what it was doing, so I was left staring at a blank screen.
The lack of feedback was really confusing. Since dhall
with no command-line options behaves kind of like cat
, I thought it was waiting for more input, or a ^D
or something. After a while I started troubleshooting my Dhall installation ("Wrong version? Maybe I'll try the static binaries instead of using what Nixpkgs gives me?"). Thankfully I eventually let it sit long enough to see it finally evaluate and realize my mistake.
I think it would be really helpful to have some kind of additional feedback about what dhall
is doing in cases like this. A progress bar would be cool, but even just a single log line saying it's downloading files would've cleared up my confusion. I also tried to reach for a --verbose
flag while troubleshooting, but the dhall
executable doesn't provide one. That might be a good incremental step.
EDIT: Probably should've left this comment on https://github.com/dhall-lang/dhall-haskell/issues/2062, but oh well.
@evanrelf: The main reason I haven't done this yet is because I'd like to first fix the original problem (the imports being slow) as much as possible before adding workarounds like logging or progress bars (although I acknowledge at some point we will need them).
One of the things I've been doing along these lines is setting up a global cache of binary-encoded and fully-resolved downloads for common Dhall packages (especially the Prelude) that Dhall interpreters can use to efficiently retrieve packages in a single HTTP request.
You can track the progress of that work here: https://discourse.dhall-lang.org/t/rfc-proxy-dhall-lang-org/144/10
$ rm -rf ~/.cache/dhall*
Note the huge difference between user time and elapsed time: more than 23 minutes! I'll try to figure out what the process is waiting for there.