Find out why ack is so slow on some machines

petdance commented 7 years ago

Sometimes ack is slower than other tools like ag, and sometimes ack is MUCH slower. It seems to be on some machines and not others. We want to find out why. We are not looking for general improvements to ack's speed. We're trying to figure out why some machines run ack much more slowly than others.

Recently, @BurntSushi created another acklike tool called ripgrep (rg). He also write an extensive article comparing rg to ag to grep. http://blog.burntsushi.net/ripgrep/ He left out ack from his speed comparisons because:

Notably absent from this list is ack. We don’t benchmark it here because it is outrageously slow. Even on the simplest benchmark (a literal in the Linux kernel repository), ack is around two orders of magnitude slower than ripgrep.

So I emailed him and we discussed it and he sent me some timings.

[andrew@Cheetah linux] time ack PM_RESUME | wc -l
16

real    0m17.329s
user    0m16.713s
sys     0m0.610s

[andrew@Cheetah linux] time ag PM_RESUME | wc -l
16

real    0m0.739s
user    0m0.803s
sys     0m1.653s

Compare that with some timings I got:

clifford:~/linux $ time grep -R PM_RESUME . | wc -l
16

real    0m7.919s
user    0m0.615s
sys     0m3.357s
clifford:~/linux $ time ack PM_RESUME | wc -l
16

real    0m24.583s
user    0m16.768s
sys     0m4.281s
clifford:~/linux $ time ag PM_RESUME | wc -l
16

real    0m8.926s
user    0m1.692s
sys     0m3.478s

In @burntsushi's case, his ack takes 25x as long. In mine, it's only 3x as long. Why?

I'm thinking that we'll have some test programs that we put in the dev/ directory. I've created a new branch, poky, for doing any work relating to this issue.

n1vux commented 7 years ago

Sounds like we need the tools described by Julie Evans. See my Oct 11th Boston.pm meeting notes for links. http://boston-pm.wikispaces.com/Calendar especially flamegraph video and http://jvns.ca/debugging-zine.pdf . We may need to do the flamegraph of strace of the two programs here and there.

(My first reaction is whichever runs second has an inherent advantage due to Linux disk caching, and the advantage depends on system tuning; so i'd like to see time ack; time ag; time ack on both systems, and a memstat and disk model, details. I know switching to SSD makes ack fly here :-) . One hopes he didn't always run them in alphabetic order and leave us out because we always prepped the OS file cache ... but not impossible )

petdance commented 7 years ago

My first reaction is whichever runs second has an inherent advantage due to Linux disk caching,

One hopes he didn't always run them in alphabetic order and leave us out because we always prepped the OS file cache ... but not impossible

Read his article. He knows what he's doing. That's why I asked him to help us out on this.

petdance commented 7 years ago

From @burntsushi's article:

Every benchmarked command is run three times before being measured as a “warm up.” Specifically, this is to ensure that the corpora being searched is already in the operating system’s page cache. If we didn’t do this, we might end up benchmarking disk I/O, which is not only uninteresting for our purposes, but is probably not a common end user scenario. It’s more likely that you’ll be executing lots of searches against the same corpus (at least, I know I do).

Every benchmarked command is run ten times, with a timing recorded for each run. The final “result” of that command is its distribution (mean +/- standard deviation). If I were a statistician, I could probably prove that ten samples is insufficient. Nevertheless, getting more samples takes more time, and for the most part, the variance is very low.

hoelzro commented 7 years ago

@n1vux The tools that Julia mentions are very handy, but in my experience with them, they're more suited towards lower-level languages like C, C++, and Rust, because a language like Perl will always be doing extra things behind the scenes that impact perf's output. In my previous investigations using strace -T and strace -c, I found that we were performing a lot more calls to open than, for example, ag. This is probably because we scan the first line of each file to determine its filetype by default.

A few things I would like to know about each machine that ack is being tested on:

Version of ack
Version of perl
CPU/RAM/Disk details

hoelzro commented 7 years ago

This is an interesting bit of output from strace:

stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
geteuid()                               = 1000
geteuid()                               = 1000
open("/tmp/linux/.cocciconfig", O_RDONLY) = 4
ioctl(4, TCGETS, 0x7ffc7a670930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(4, 0, SEEK_CUR)                   = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
read(4, "[spatch]\n\toptions = --timeout 20"..., 8192) = 59
read(4, "", 8192)                       = 0
close(4)                                = 0
open("/tmp/linux/.cocciconfig", O_RDONLY) = 4
ioctl(4, TCGETS, 0x7ffc7a670a90)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(4, 0, SEEK_CUR)                   = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
read(4, "[spatch]\n\toptions = --timeout 20"..., 8192) = 59
read(4, "", 8192)                       = 0
close(4)                                = 0

For some reason, we're opening the file and reading from it twice? Not to mention all of the redundant stat/fstat calls, that ioctl call (which will always fail for regular files), and a useless lseek call. These probably don't do much by themselves, but I'm betting over a large list of files (ack -f shows me about 55000), I'm betting this would add up.

n1vux commented 7 years ago

On Sep 26, 2016 01:06, "Andy Lester" notifications@github.com wrote:

My first reaction is

Read his article. He knows what he's doing. That's why I asked him to help us out on this.

Excellent. Since you took the report seriously, I expected as much, hence my literal "first reaction" was in parenthetical.()

(I will assume arguendo until i do have time to RTFA that there's documentation that system does have enough RAM to hold whole set of interesting files and inodes in cache, else warm up doesn't prevent measuring disk.)

Since ack is essentially doing the same thing on both systems the question is whether how it's doing it that differs or just OS/HW differences. My recommended articles above are the current best-practices path forward for this sort of A/B problem. NYTprof flame graph will show us if our code is taking different paths on average. Flamegraph statistics of kernel strace will show if we are doing more / different system calls in one OS env than another. Maybe user mode as well, although that will show Perl code not ack.

n1vux commented 7 years ago

On Sep 26, 2016 09:21, "Rob Hoelz" notifications@github.com wrote:

@n1vux The tools that Julia mentions are very handy, but in my experience with them, they're more suited towards lower-level languages like C, C++, and Rust, because a language like Perl will always be doing extra things behind the scenes that impact perf's output.

Right, kernel mode is more useful than usermode, we would want NYTprof for user mode.

In my previous investigations using strace -T and strace -c, I found that we were performing a lot more calls to open than, for example, ag. This is probably because we scan the first line of each file to determine its filetype by default.

Roll up stats with flamegraph of strace will tshow if those extra opens are what is different between two ack installs or point elsewhere.

A few things I would like to know about each machine that ack is being tested on:

Version of ack Version of perl CPU/RAM/Disk details

Right, and memstat iostat readings etc (on JUlias list) Even if versions differ, how one ack is consuming more time an another as measured by averaged straces seems useful?

petdance commented 7 years ago

I'm wary of going down the NYTProf road because @burntsushi is not a Perl user at all, and I don't even know if he can install modules on his end. But I don't know without him weighing in on this.

(I will assume arguendo until i do have time to RTFA that there's documentation that system does have enough RAM to hold whole set of interesting files and inodes in cache, else warm up doesn't prevent measuring disk.)

The article doesn't address why ack is so slow. I point at the article just as an indication that he knows what he's doing.

I don't think we want to discount anything. Maybe the difference IS that one machine has more RAM than others. Maybe an answer is "ack doesn't take advantage of cached files as well as ag does." Maybe we should be looking at corpora smaller than the Linux tree, and that will turn over rocks.

I'm hoping that we can run ack on a machine that gives slow results, and then change a factor external to ack itself, and have ack speed up in a way that doesn't affect ag (much). My goal here is to be able to tell users "If you are finding ack to be slow, it may be because of A, in which case you should X, or B, in which case you should Y. Or, just know that these factors affect it."

I know switching to SSD makes ack fly here

Does an SSD make ack speed up but doesn't have as much effect on ag? Let's figure out why that is.

Also, I'm picking on ag here because it's in C, and I don't want to have to add Rust to our list of unknowns.

petdance commented 7 years ago

For some reason, we're opening the file and reading from it twice?

Let's say that's the case. Why is one machine handling it so much more poorly than the other? Can we put together a test program in dev/ on the poky branch that we can try on different machines to show that certain systems handle the multiple opens worse than others?

That's not to say we don't want to eliminate extra opens. Just diagnosing at this point.

BurntSushi commented 7 years ago

I just wanted to chime in here and say that I want to be as helpful as possible. I'm at work so I don't have a lot of time to spend writing, but here are some high level points:

The machine I ran ack on had 64GB of memory, an i7-6900K and an SSD. The Linux kernel source tree easily fits into memory. I account for disk cache.
I'd like for us to consider the possibility that ack isn't slower on some systems, but that ag is slower on some systems. For instance, ag taking 9 seconds on the Linux repo is quite striking to me.
@petdance Might you be willing to include an rg data point on your machine? (I am most familiar with its cost model.) It may not be worth it though, since you do have a grep data point.

We should all clarify our environments, including whether it's a VM or not, since I have noticed big differences there under certain circumstances. (ag is, in my experience, much slower on a VM because it uses memory maps.) My environment is Linux 4, no VM, ext4 file system.

Here are some things I'd like to do to help you guys track down an explanation:

Do whatever is necessary to extract a CPU profile out of ack on Linux. (I don't know the first thing about Perl, so if there's some tutorial or something I can follow, that'd work.)
I can try ack on both a Mac and Windows (non-VM). At least for the Mac, I can also try rg and ag and provide results. I can compare rg, grep and ack on Windows (I think ack works on Windows).

I think there were a few things I'm forgetting here that I was thinking about on my drive into work today, but I'll keep noodling.

n1vux commented 7 years ago

another tool vaguely related to finding out what's happening

progress bars for all open files - in another terminal

http://www.tecmint.com/progress-monitor-check-progress-of-linux-commands/ " It utterly scans the /proc filesystem for fascinating commands, and then searches thefd and fdinfo directories to find opened files, seeks positions, and reports status for the extensive files. Importantly, it is a very light tool, and compatible with practically any command."

petdance commented 7 years ago

This issue was moved to petdance/ack3#42

beyondgrep / ack2

Find out why ack is so slow on some machines #610