Find out why ack is so slow on some machines

petdance commented 7 years ago

From @petdance on September 26, 2016 0:50

Sometimes ack is slower than other tools like ag, and sometimes ack is MUCH slower. It seems to be on some machines and not others. We want to find out why. We are not looking for general improvements to ack's speed. We're trying to figure out why some machines run ack much more slowly than others.

Recently, @BurntSushi created another acklike tool called ripgrep (rg). He also write an extensive article comparing rg to ag to grep. http://blog.burntsushi.net/ripgrep/ He left out ack from his speed comparisons because:

Notably absent from this list is ack. We don’t benchmark it here because it is outrageously slow. Even on the simplest benchmark (a literal in the Linux kernel repository), ack is around two orders of magnitude slower than ripgrep.

So I emailed him and we discussed it and he sent me some timings.

[andrew@Cheetah linux] time ack PM_RESUME | wc -l
16

real    0m17.329s
user    0m16.713s
sys     0m0.610s

[andrew@Cheetah linux] time ag PM_RESUME | wc -l
16

real    0m0.739s
user    0m0.803s
sys     0m1.653s

Compare that with some timings I got:

clifford:~/linux $ time grep -R PM_RESUME . | wc -l
16

real    0m7.919s
user    0m0.615s
sys     0m3.357s
clifford:~/linux $ time ack PM_RESUME | wc -l
16

real    0m24.583s
user    0m16.768s
sys     0m4.281s
clifford:~/linux $ time ag PM_RESUME | wc -l
16

real    0m8.926s
user    0m1.692s
sys     0m3.478s

In @burntsushi's case, his ack takes 25x as long. In mine, it's only 3x as long. Why?

I'm thinking that we'll have some test programs that we put in the dev/ directory. I've created a new branch, poky, for doing any work relating to this issue.

Copied from original issue: petdance/ack2#610

petdance commented 7 years ago

From @n1vux on September 26, 2016 4:49

Sounds like we need the tools described by Julie Evans. See my Oct 11th Boston.pm meeting notes for links. http://boston-pm.wikispaces.com/Calendar especially flamegraph video and http://jvns.ca/debugging-zine.pdf . We may need to do the flamegraph of strace of the two programs here and there.

(My first reaction is whichever runs second has an inherent advantage due to Linux disk caching, and the advantage depends on system tuning; so i'd like to see time ack; time ag; time ack on both systems, and a memstat and disk model, details. I know switching to SSD makes ack fly here :-) . One hopes he didn't always run them in alphabetic order and leave us out because we always prepped the OS file cache ... but not impossible )

petdance commented 7 years ago

My first reaction is whichever runs second has an inherent advantage due to Linux disk caching,

One hopes he didn't always run them in alphabetic order and leave us out because we always prepped the OS file cache ... but not impossible

Read his article. He knows what he's doing. That's why I asked him to help us out on this.

petdance commented 7 years ago

From @burntsushi's article:

Every benchmarked command is run three times before being measured as a “warm up.” Specifically, this is to ensure that the corpora being searched is already in the operating system’s page cache. If we didn’t do this, we might end up benchmarking disk I/O, which is not only uninteresting for our purposes, but is probably not a common end user scenario. It’s more likely that you’ll be executing lots of searches against the same corpus (at least, I know I do).

Every benchmarked command is run ten times, with a timing recorded for each run. The final “result” of that command is its distribution (mean +/- standard deviation). If I were a statistician, I could probably prove that ten samples is insufficient. Nevertheless, getting more samples takes more time, and for the most part, the variance is very low.

petdance commented 7 years ago

From @hoelzro on September 26, 2016 13:21

@n1vux The tools that Julia mentions are very handy, but in my experience with them, they're more suited towards lower-level languages like C, C++, and Rust, because a language like Perl will always be doing extra things behind the scenes that impact perf's output. In my previous investigations using strace -T and strace -c, I found that we were performing a lot more calls to open than, for example, ag. This is probably because we scan the first line of each file to determine its filetype by default.

A few things I would like to know about each machine that ack is being tested on:

Version of ack
Version of perl
CPU/RAM/Disk details

petdance commented 7 years ago

From @hoelzro on September 26, 2016 13:34

This is an interesting bit of output from strace:

stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
stat("/tmp/linux/.cocciconfig", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
geteuid()                               = 1000
geteuid()                               = 1000
open("/tmp/linux/.cocciconfig", O_RDONLY) = 4
ioctl(4, TCGETS, 0x7ffc7a670930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(4, 0, SEEK_CUR)                   = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
read(4, "[spatch]\n\toptions = --timeout 20"..., 8192) = 59
read(4, "", 8192)                       = 0
close(4)                                = 0
open("/tmp/linux/.cocciconfig", O_RDONLY) = 4
ioctl(4, TCGETS, 0x7ffc7a670a90)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(4, 0, SEEK_CUR)                   = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
read(4, "[spatch]\n\toptions = --timeout 20"..., 8192) = 59
read(4, "", 8192)                       = 0
close(4)                                = 0

For some reason, we're opening the file and reading from it twice? Not to mention all of the redundant stat/fstat calls, that ioctl call (which will always fail for regular files), and a useless lseek call. These probably don't do much by themselves, but I'm betting over a large list of files (ack -f shows me about 55000), I'm betting this would add up.

petdance commented 7 years ago

From @n1vux on September 26, 2016 13:35

On Sep 26, 2016 01:06, "Andy Lester" notifications@github.com wrote:

My first reaction is

Read his article. He knows what he's doing. That's why I asked him to help us out on this.

Excellent. Since you took the report seriously, I expected as much, hence my literal "first reaction" was in parenthetical.()

(I will assume arguendo until i do have time to RTFA that there's documentation that system does have enough RAM to hold whole set of interesting files and inodes in cache, else warm up doesn't prevent measuring disk.)

Since ack is essentially doing the same thing on both systems the question is whether how it's doing it that differs or just OS/HW differences. My recommended articles above are the current best-practices path forward for this sort of A/B problem. NYTprof flame graph will show us if our code is taking different paths on average. Flamegraph statistics of kernel strace will show if we are doing more / different system calls in one OS env than another. Maybe user mode as well, although that will show Perl code not ack.

petdance commented 7 years ago

From @n1vux on September 26, 2016 13:43

On Sep 26, 2016 09:21, "Rob Hoelz" notifications@github.com wrote:

@n1vux The tools that Julia mentions are very handy, but in my experience with them, they're more suited towards lower-level languages like C, C++, and Rust, because a language like Perl will always be doing extra things behind the scenes that impact perf's output.

Right, kernel mode is more useful than usermode, we would want NYTprof for user mode.

In my previous investigations using strace -T and strace -c, I found that we were performing a lot more calls to open than, for example, ag. This is probably because we scan the first line of each file to determine its filetype by default.

Roll up stats with flamegraph of strace will tshow if those extra opens are what is different between two ack installs or point elsewhere.

A few things I would like to know about each machine that ack is being tested on:

Version of ack Version of perl CPU/RAM/Disk details

Right, and memstat iostat readings etc (on JUlias list) Even if versions differ, how one ack is consuming more time an another as measured by averaged straces seems useful?

petdance commented 7 years ago

I'm wary of going down the NYTProf road because @burntsushi is not a Perl user at all, and I don't even know if he can install modules on his end. But I don't know without him weighing in on this.

(I will assume arguendo until i do have time to RTFA that there's documentation that system does have enough RAM to hold whole set of interesting files and inodes in cache, else warm up doesn't prevent measuring disk.)

The article doesn't address why ack is so slow. I point at the article just as an indication that he knows what he's doing.

I don't think we want to discount anything. Maybe the difference IS that one machine has more RAM than others. Maybe an answer is "ack doesn't take advantage of cached files as well as ag does." Maybe we should be looking at corpora smaller than the Linux tree, and that will turn over rocks.

I'm hoping that we can run ack on a machine that gives slow results, and then change a factor external to ack itself, and have ack speed up in a way that doesn't affect ag (much). My goal here is to be able to tell users "If you are finding ack to be slow, it may be because of A, in which case you should X, or B, in which case you should Y. Or, just know that these factors affect it."

I know switching to SSD makes ack fly here

Does an SSD make ack speed up but doesn't have as much effect on ag? Let's figure out why that is.

Also, I'm picking on ag here because it's in C, and I don't want to have to add Rust to our list of unknowns.

petdance commented 7 years ago

For some reason, we're opening the file and reading from it twice?

Let's say that's the case. Why is one machine handling it so much more poorly than the other? Can we put together a test program in dev/ on the poky branch that we can try on different machines to show that certain systems handle the multiple opens worse than others?

That's not to say we don't want to eliminate extra opens. Just diagnosing at this point.

petdance commented 7 years ago

From @BurntSushi on September 26, 2016 17:29

I just wanted to chime in here and say that I want to be as helpful as possible. I'm at work so I don't have a lot of time to spend writing, but here are some high level points:

The machine I ran ack on had 64GB of memory, an i7-6900K and an SSD. The Linux kernel source tree easily fits into memory. I account for disk cache.
I'd like for us to consider the possibility that ack isn't slower on some systems, but that ag is slower on some systems. For instance, ag taking 9 seconds on the Linux repo is quite striking to me.
@petdance Might you be willing to include an rg data point on your machine? (I am most familiar with its cost model.) It may not be worth it though, since you do have a grep data point.

We should all clarify our environments, including whether it's a VM or not, since I have noticed big differences there under certain circumstances. (ag is, in my experience, much slower on a VM because it uses memory maps.) My environment is Linux 4, no VM, ext4 file system.

Here are some things I'd like to do to help you guys track down an explanation:

Do whatever is necessary to extract a CPU profile out of ack on Linux. (I don't know the first thing about Perl, so if there's some tutorial or something I can follow, that'd work.)
I can try ack on both a Mac and Windows (non-VM). At least for the Mac, I can also try rg and ag and provide results. I can compare rg, grep and ack on Windows (I think ack works on Windows).

I think there were a few things I'm forgetting here that I was thinking about on my drive into work today, but I'll keep noodling.

petdance commented 7 years ago

From @n1vux on October 1, 2016 18:56

another tool vaguely related to finding out what's happening

progress bars for all open files - in another terminal

http://www.tecmint.com/progress-monitor-check-progress-of-linux-commands/ " It utterly scans the /proc filesystem for fascinating commands, and then searches thefd and fdinfo directories to find opened files, seeks positions, and reports status for the extensive files. Importantly, it is a very light tool, and compatible with practically any command."

petdance commented 6 years ago

@burntsushi, I'm not sure what I'm doing wrong to get your benchsuite to run. Can you help me, please?

I've cloned the linux repo into ~/ripgrep/benchsuite, which is what --download linux seemed to want to do but gave errors:

# Building Linux kernel...
# make defconfig
Traceback (most recent call last):
  File "./benchsuite", line 1323, in <module>
    main()
  File "./benchsuite", line 1249, in main
    download(args.dir, args.download)
  File "./benchsuite", line 1135, in download
    download_linux(suite_dir)
  File "./benchsuite", line 1063, in download_linux
    run_cmd(['make', 'defconfig'], cwd=checkout_dir)
  File "./benchsuite", line 1032, in run_cmd
    return subprocess.run(cmd, *args, **kwargs)
AttributeError: 'module' object has no attribute 'run'

so I manually did the clone and the make defconfig like it wanted to do, but now I have this:

[20:03:42] (master) clifford:~/ripgrep/benchsuite $ ./benchsuite --dir ./ 
missing: linux, skipping benchmark linux_alternates (try running with: --download linux)
missing: linux, skipping benchmark linux_alternates_casei (try running with: --download linux)
missing: linux, skipping benchmark linux_literal (try running with: --download linux)
missing: linux, skipping benchmark linux_literal_casei (try running with: --download linux)
missing: linux, skipping benchmark linux_literal_default (try running with: --download linux)
missing: linux, skipping benchmark linux_no_literal (try running with: --download linux)
missing: linux, skipping benchmark linux_re_literal_suffix (try running with: --download linux)
missing: linux, skipping benchmark linux_unicode_greek (try running with: --download linux)
missing: linux, skipping benchmark linux_unicode_greek_casei (try running with: --download linux)
missing: linux, skipping benchmark linux_unicode_word (try running with: --download linux)
missing: linux, skipping benchmark linux_word (try running with: --download linux)
missing: subtitles-en, skipping benchmark subtitles_en_alternate (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_alternate_casei (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_literal (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_literal_casei (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_literal_word (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_no_literal (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_surrounding_words (try running with: --download subtitles-en)
missing: subtitles-ru, skipping benchmark subtitles_ru_alternate (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_alternate_casei (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_literal (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_literal_casei (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_literal_word (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_no_literal (try running with: --download subtitles-ru)
missing: subtitles-en, skipping benchmark subtitles_ru_surrounding_words (try running with: --download subtitles-en)

BurntSushi commented 6 years ago

@petdance With respect to your initial error, it looks like the benchsuite script requires at least Python 3.5 to run. Here it is working for me:

[andrew@Serval benchsuite] pwd
/home/andrew/clones/ripgrep/benchsuite
[andrew@Serval benchsuite] ls -l
total 52
drwxr-xr-x 12 andrew users  4096 Jan  9 07:39 runs
-rwxr-xr-x  1 andrew users 47324 Sep  5 11:20 benchsuite
[andrew@Serval benchsuite] mkdir data
[andrew@Serval benchsuite] ./benchsuite --dir ./data
missing: linux, skipping benchmark linux_alternates (try running with: --download linux)
missing: linux, skipping benchmark linux_alternates_casei (try running with: --download linux)
missing: linux, skipping benchmark linux_literal (try running with: --download linux)
missing: linux, skipping benchmark linux_literal_casei (try running with: --download linux)
missing: linux, skipping benchmark linux_literal_default (try running with: --download linux)
missing: linux, skipping benchmark linux_no_literal (try running with: --download linux)
missing: linux, skipping benchmark linux_re_literal_suffix (try running with: --download linux)
missing: linux, skipping benchmark linux_unicode_greek (try running with: --download linux)
missing: linux, skipping benchmark linux_unicode_greek_casei (try running with: --download linux)
missing: linux, skipping benchmark linux_unicode_word (try running with: --download linux)
missing: linux, skipping benchmark linux_word (try running with: --download linux)
missing: subtitles-en, skipping benchmark subtitles_en_alternate (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_alternate_casei (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_literal (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_literal_casei (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_literal_word (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_no_literal (try running with: --download subtitles-en)
missing: subtitles-en, skipping benchmark subtitles_en_surrounding_words (try running with: --download subtitles-en)
missing: subtitles-ru, skipping benchmark subtitles_ru_alternate (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_alternate_casei (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_literal (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_literal_casei (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_literal_word (try running with: --download subtitles-ru)
missing: subtitles-ru, skipping benchmark subtitles_ru_no_literal (try running with: --download subtitles-ru)
missing: subtitles-en, skipping benchmark subtitles_ru_surrounding_words (try running with: --download subtitles-en)
[andrew@Serval benchsuite] ./benchsuite --dir ./data --download linux
# git clone --depth 1 git://github.com/BurntSushi/linux ./data/linux
Cloning into './data/linux'...
remote: Counting objects: 66270, done.
remote: Compressing objects: 100% (59047/59047), done.
remote: Total 66270 (delta 6229), reused 66269 (delta 6229), pack-reused 0
Receiving objects: 100% (66270/66270), 175.14 MiB | 11.17 MiB/s, done.
Resolving deltas: 100% (6229/6229), done.
# Building Linux kernel...
# make defconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  SHIPPED scripts/kconfig/zconf.tab.c
  SHIPPED scripts/kconfig/zconf.lex.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
*** Default configuration is based on 'x86_64_defconfig'
#
# configuration written to .config
#
# make -j 4
[snip]
[andrew@Serval benchsuite] ./benchsuite --dir ./data
linux_alternates (pattern: ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)
-------------------------------------------------------------------------
[snip]

If the script says it's "missing linux," then that means it doesn't know where the data is. The benchsuite script doesn't hard-code any directory paths. So for example, if you're giving it --dir ./, then that implies there should be a checkout of linux at ./linux.

Also note that I've updated github.com/BurntSushi/linux to the latest upstream master since the older version no longer built on my machine. You can get the previous snapshot by checking out the ag/2016-09 branch.

petdance commented 6 years ago

So for example, if you're giving it --dir ./, then that implies there should be a checkout of linux at ./linux.

Right, that's what I had done. And I ran the make defconfig. But then it complained as above. But if it requires Python 3.5, then that would explain it.

I guess what I really want to know is "What commands is your test running so that I can run equivalent commands on ack"?

BurntSushi commented 6 years ago

I'm on mobile, but all of the commands are declared in each benchmark in the top half of the script. The raw data for each run is also committed to the repo in the benchsuite/runs directory, and includes exactly the commands that were run. The benchmark harness will run each command several times and report mean/stddev.

beyondgrep / ack3

Find out why ack is so slow on some machines #42