Closed petdance closed 7 years ago
Sounds like we need the tools described by Julie Evans. See my Oct 11th Boston.pm meeting notes for links. http://boston-pm.wikispaces.com/Calendar especially flamegraph video and http://jvns.ca/debugging-zine.pdf . We may need to do the flamegraph of strace of the two programs here and there.
(My first reaction is whichever runs second has an inherent advantage due to Linux disk caching, and the advantage depends on system tuning; so i'd like to see time ack; time ag; time ack on both systems, and a memstat and disk model, details. I know switching to SSD makes ack fly here :-) . One hopes he didn't always run them in alphabetic order and leave us out because we always prepped the OS file cache ... but not impossible )
My first reaction is whichever runs second has an inherent advantage due to Linux disk caching,
One hopes he didn't always run them in alphabetic order and leave us out because we always prepped the OS file cache ... but not impossible
Read his article. He knows what he's doing. That's why I asked him to help us out on this.
From @burntsushi's article:
Every benchmarked command is run three times before being measured as a “warm up.” Specifically, this is to ensure that the corpora being searched is already in the operating system’s page cache. If we didn’t do this, we might end up benchmarking disk I/O, which is not only uninteresting for our purposes, but is probably not a common end user scenario. It’s more likely that you’ll be executing lots of searches against the same corpus (at least, I know I do).
Every benchmarked command is run ten times, with a timing recorded for each run. The final “result” of that command is its distribution (mean +/- standard deviation). If I were a statistician, I could probably prove that ten samples is insufficient. Nevertheless, getting more samples takes more time, and for the most part, the variance is very low.
@n1vux The tools that Julia mentions are very handy, but in my experience with them, they're more suited towards lower-level languages like C, C++, and Rust, because a language like Perl will always be doing extra things behind the scenes that impact perf's output. In my previous investigations using strace -T
and strace -c
, I found that we were performing a lot more calls to open
than, for example, ag. This is probably because we scan the first line of each file to determine its filetype by default.
A few things I would like to know about each machine that ack is being tested on:
This is an interesting bit of output from strace
:
stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
geteuid() = 1000
geteuid() = 1000
open("/tmp/linux/.cocciconfig", O_RDONLY) = 4
ioctl(4, TCGETS, 0x7ffc7a670930) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(4, 0, SEEK_CUR) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
read(4, "[spatch]\n\toptions = --timeout 20"..., 8192) = 59
read(4, "", 8192) = 0
close(4) = 0
open("/tmp/linux/.cocciconfig", O_RDONLY) = 4
ioctl(4, TCGETS, 0x7ffc7a670a90) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(4, 0, SEEK_CUR) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
read(4, "[spatch]\n\toptions = --timeout 20"..., 8192) = 59
read(4, "", 8192) = 0
close(4) = 0
For some reason, we're opening the file and reading from it twice? Not to mention all of the redundant stat
/fstat
calls, that ioctl
call (which will always fail for regular files), and a useless lseek
call. These probably don't do much by themselves, but I'm betting over a large list of files (ack -f shows me about 55000), I'm betting this would add up.
On Sep 26, 2016 01:06, "Andy Lester" notifications@github.com wrote:
My first reaction is
Read his article. He knows what he's doing. That's why I asked him to help us out on this.
Excellent. Since you took the report seriously, I expected as much, hence my literal "first reaction" was in parenthetical.()
(I will assume arguendo until i do have time to RTFA that there's documentation that system does have enough RAM to hold whole set of interesting files and inodes in cache, else warm up doesn't prevent measuring disk.)
Since ack is essentially doing the same thing on both systems the question is whether how it's doing it that differs or just OS/HW differences. My recommended articles above are the current best-practices path forward for this sort of A/B problem. NYTprof flame graph will show us if our code is taking different paths on average. Flamegraph statistics of kernel strace will show if we are doing more / different system calls in one OS env than another. Maybe user mode as well, although that will show Perl code not ack.
On Sep 26, 2016 09:21, "Rob Hoelz" notifications@github.com wrote:
@n1vux The tools that Julia mentions are very handy, but in my experience with them, they're more suited towards lower-level languages like C, C++, and Rust, because a language like Perl will always be doing extra things behind the scenes that impact perf's output.
Right, kernel mode is more useful than usermode, we would want NYTprof for user mode.
In my previous investigations using strace -T and strace -c, I found that we were performing a lot more calls to open than, for example, ag. This is probably because we scan the first line of each file to determine its filetype by default.
Roll up stats with flamegraph of strace will tshow if those extra opens are what is different between two ack installs or point elsewhere.
A few things I would like to know about each machine that ack is being tested on:
Version of ack Version of perl CPU/RAM/Disk details
Right, and memstat iostat readings etc (on JUlias list) Even if versions differ, how one ack is consuming more time an another as measured by averaged straces seems useful?
I'm wary of going down the NYTProf road because @burntsushi is not a Perl user at all, and I don't even know if he can install modules on his end. But I don't know without him weighing in on this.
(I will assume arguendo until i do have time to RTFA that there's documentation that system does have enough RAM to hold whole set of interesting files and inodes in cache, else warm up doesn't prevent measuring disk.)
The article doesn't address why ack is so slow. I point at the article just as an indication that he knows what he's doing.
I don't think we want to discount anything. Maybe the difference IS that one machine has more RAM than others. Maybe an answer is "ack doesn't take advantage of cached files as well as ag does." Maybe we should be looking at corpora smaller than the Linux tree, and that will turn over rocks.
I'm hoping that we can run ack on a machine that gives slow results, and then change a factor external to ack itself, and have ack speed up in a way that doesn't affect ag (much). My goal here is to be able to tell users "If you are finding ack to be slow, it may be because of A, in which case you should X, or B, in which case you should Y. Or, just know that these factors affect it."
I know switching to SSD makes ack fly here
Does an SSD make ack speed up but doesn't have as much effect on ag? Let's figure out why that is.
Also, I'm picking on ag here because it's in C, and I don't want to have to add Rust to our list of unknowns.
For some reason, we're opening the file and reading from it twice?
Let's say that's the case. Why is one machine handling it so much more poorly than the other? Can we put together a test program in dev/
on the poky
branch that we can try on different machines to show that certain systems handle the multiple opens worse than others?
That's not to say we don't want to eliminate extra opens. Just diagnosing at this point.
I just wanted to chime in here and say that I want to be as helpful as possible. I'm at work so I don't have a lot of time to spend writing, but here are some high level points:
ack
on had 64GB of memory, an i7-6900K and an SSD. The Linux kernel source tree easily fits into memory. I account for disk cache.ack
isn't slower on some systems, but that ag
is slower on some systems. For instance, ag
taking 9 seconds on the Linux repo is quite striking to me.rg
data point on your machine? (I am most familiar with its cost model.) It may not be worth it though, since you do have a grep
data point.We should all clarify our environments, including whether it's a VM or not, since I have noticed big differences there under certain circumstances. (ag
is, in my experience, much slower on a VM because it uses memory maps.) My environment is Linux 4, no VM, ext4 file system.
Here are some things I'd like to do to help you guys track down an explanation:
ack
on Linux. (I don't know the first thing about Perl, so if there's some tutorial or something I can follow, that'd work.)ack
on both a Mac and Windows (non-VM). At least for the Mac, I can also try rg
and ag
and provide results. I can compare rg
, grep
and ack
on Windows (I think ack
works on Windows).I think there were a few things I'm forgetting here that I was thinking about on my drive into work today, but I'll keep noodling.
another tool vaguely related to finding out what's happening
progress bars for all open files - in another terminal
http://www.tecmint.com/progress-monitor-check-progress-of-linux-commands/ " It utterly scans the /proc filesystem for fascinating commands, and then searches thefd and fdinfo directories to find opened files, seeks positions, and reports status for the extensive files. Importantly, it is a very light tool, and compatible with practically any command."
This issue was moved to petdance/ack3#42
Sometimes ack is slower than other tools like ag, and sometimes ack is MUCH slower. It seems to be on some machines and not others. We want to find out why. We are not looking for general improvements to ack's speed. We're trying to figure out why some machines run ack much more slowly than others.
Recently, @BurntSushi created another acklike tool called ripgrep (rg). He also write an extensive article comparing rg to ag to grep. http://blog.burntsushi.net/ripgrep/ He left out ack from his speed comparisons because:
So I emailed him and we discussed it and he sent me some timings.
Compare that with some timings I got:
In @burntsushi's case, his ack takes 25x as long. In mine, it's only 3x as long. Why?
I'm thinking that we'll have some test programs that we put in the dev/ directory. I've created a new branch,
poky
, for doing any work relating to this issue.