Sandboxing slow - Githubissues

AustinSchuh commented 7 years ago

@philwo I'm back... I tried out a recent build of bazel on our code base, and, after hacking around some changes to how the toolchains have changed, ran into issues. I saw 100% CPU usage (1 core) for Java, and, after some progress, no build progress. I waited 40 minutes with no progress (Bazel wasn't reporting any actions were being executed). atop said that bazel was writing 4 MB/s to disk at the time.

Attaching with strace yielded the following info:

[pid 58046] sched_yield() = 0 [pid 59244] lstat("/home/austin/.cache/bazel/_bazel_austin/dd758a6cc6943296d5c6a65e2c252c63/bazel-sandbox/202e6024-7d68-4028-8ba9-65ade44518e7-240/peloton-tech/execroot/peloton-tech/external/androidndk/ndk/platforms/android-21/arch-arm/usr/include/linux/netfilter/ipset/ip_set_bitmap.h", <unfinished ...> [pid 58046] write(129, "l", 1 <unfinished ...> [pid 59244] <... lstat resumed> {st_mode=S_IFLNK|0777, st_size=198, ...}) = 0 [pid 58046] <... write resumed> ) = 1 [pid 59244] unlink("/home/austin/.cache/bazel/_bazel_austin/dd758a6cc6943296d5c6a65e2c252c63/bazel-sandbox/202e6024-7d68-4028-8ba9-65ade44518e7-240/peloton-tech/execroot/peloton-tech/external/androidndk/ndk/platforms/android-21/arch-arm/usr/include/linux/netfilter/ipset/ip_set_bitmap.h" <unfinished ...> [pid 58046] futex(0x7fd544073094, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fd544073090, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 0 [pid 56541] futex(0x7fd544073094, FUTEX_WAIT_PRIVATE, 1631649, NULL <unfinished ...> [pid 58046] sched_yield() = 0 [pid 56541] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 58046] sched_yield() = 0 [pid 58046] futex(0x7fd47c004e04, FUTEX_WAIT_PRIVATE, 8519, NULL <unfinished ...> [pid 56541] futex(0x7fd544073068, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 56541] futex(0x7fd47c004e04, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fd47c004e00, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...> [pid 58046] <... futex resumed> ) = 0 [pid 58046] futex(0x7fd47c004dd8, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 56541] <... futex resumed> ) = 1 [pid 56541] futex(0x7fd47c004dd8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 58046] <... futex resumed> ) = 0 [pid 58046] futex(0x7fd47c004dd8, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 56541] <... futex resumed> ) = 1 [pid 58046] write(129, "i", 1) = 1 [pid 58046] futex(0x7fd544073094, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fd544073090, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 0 [pid 56541] futex(0x7fd544073094, FUTEX_WAIT_PRIVATE, 1631651, NULL <unfinished ...> [pid 58046] sched_yield() = 0 [pid 56541] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 58046] sched_yield() = 0 [pid 58046] futex(0x7fd47c004e04, FUTEX_WAIT_PRIVATE, 8521, NULL <unfinished ...> [pid 56541] futex(0x7fd544073068, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 56541] futex(0x7fd47c004e04, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fd47c004e00, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...> [pid 58046] <... futex resumed> ) = 0 [pid 58046] futex(0x7fd47c004dd8, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 56541] <... futex resumed> ) = 1 [pid 56541] futex(0x7fd47c004dd8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 58046] <... futex resumed> ) = 0

I unfortunately don't have a repo I can give you to test with. I'm trying to get the robotics code building with the latest bazel since that is open source to see if the issue persists, but I'm running into the same toolchain issues, and I've been dragged away from my desk enough that I'm not making fast progress.

damienmg commented 7 years ago

Just triaging in prevention if that's a big regression, we need to figure things out before actually releasing bazel (though we are safe to do a release candidate).

/cc @kchodorow and @katre for the release

brian-peloton commented 7 years ago

A bit more info from some information gathering I did: this is on XFS on an NVME SSD. It happens even when the filesystem isn't nearly full. Other processes do manage to read and write data to the same filesystem at reasonable speeds while bazel is crawling along. strace with timestamps revealed that all the filesystem operations in all of bazel's threads were moving along for a bit, and then they would all stop for long periods of time (half a second and up). All of the symlink calls were locking up for even longer periods of time (a second or two) than the unlink and fstat calls. No idea why that's happening though...

AustinSchuh commented 7 years ago

After an expunge, I see the following baseline: (I ran a fetch before hand).

time bazel test //y2016/control_loops/...
INFO: Elapsed time: 405.230s, Critical Path: 269.49s
INFO: Build completed successfully, 1420 total actions
//y2016/control_loops/shooter:shooter_lib_test                           PASSED in 0.2s
//y2016/control_loops/superstructure:superstructure_lib_test             PASSED in 7.9s

Executed 2 out of 2 tests: 2 tests pass.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.

real    6m45.243s
user    0m0.088s
sys     0m1.036s

I did the same with a recent bazel (as of about a week ago, which has the new sandbox code in it). For full disclosure, I've got the Peloton patch for flakey tests in that bazel, but Brian doesn't think that will affect anything. I can re-run with a vanilla master if you want.

I've waited 1/2 hour now, and it has only completed 900/1415 actions. I'll update this in the morning with the actual time it took, but needless to say, it's already a regression.

I'm using the robotics repository, which is open source, so I can get you a copy of that if you want, and instructions on how to build up a machine with all the right bits to make it work.

AustinSchuh commented 7 years ago

I ran strace on it for a couple seconds, and here's the output

strace -tt -f -p 78469

https://gist.github.com/AustinSchuh/968ebf69a6093a2e754bc21eff231a6d

Hopefully that's actually helpful

brian-peloton commented 7 years ago

I don't see the large gaps in all syscalls in that trace, but no symlinks finish between 20:43:12.413991 and 20:43:12.451891, then a bunch finish until 20:43:12.456680 when they all stop again until 20:43:12.505149 (except a single one at 20:43:12.483374). That looks really strange, but I have no idea why it's happening. It's not an issue with long latencies; while symlinks are finishing many threads finish several within a few milliseconds.

AustinSchuh commented 7 years ago

It finally finished.

$ time BAZEL_OVERRIDE=/home/austin/local/bazel/bazel-bin/src/bazel bazel test //y2016/control_loops/...
Actually calling /home/austin/local/bazel/bazel-bin/src/bazel
//y2016/control_loops/shooter:shooter_lib_test                           PASSED in 0.2s
//y2016/control_loops/superstructure:superstructure_lib_test             PASSED in 8.0s

Executed 2 out of 2 tests: 2 tests pass.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.

real    55m17.180s
user    0m52.313s
sys     0m51.111s

Let me know if you need any more info.

philwo commented 7 years ago

@AustinSchuh

I can get you a copy of that if you want, and instructions on how to build up a machine with all the right bits to make it work.

That would be great. I will try to reproduce this on my workstation or a separate test machine then.

I'll also just try tomorrow with bootstrapping Bazel on an XFS filesystem on an SSD (though no NVME, unfortunately 😊) and see if that is suspiciously slower than on ext4.

AustinSchuh commented 7 years ago

@philwo A copy of the repo lives at https://s3-us-west-2.amazonaws.com/peloton-bazel/971-Robot-Code.tar.xz, and instructions to build a machine with the tools we have installed lives at https://s3-us-west-2.amazonaws.com/peloton-bazel/vagrant_dev_vm.tar.gz. I'm happy to provide more info if you need help reproducing our setup.

The readme in the second tar should have instructions on how to build a machine with the right tools, and setup_code_building.sh should have the list of packages. @philsc wrote the setup scripts and instructions originally.

philwo commented 7 years ago

I didn't have time to work on this yet, because I had to fix a couple of other release blocking issues. The plan is to cut a release candidate with the reliability fixes for sandboxing now, which will also give us a few days to see these running and get some feedback. Once I'm back in the office on Tuesday (Monday is a public holiday here), I'll work on this performance issue. When we have a fix, we'll create another 0.3.2rc with the fix cherry-picked and if all goes well, this can be our release then shortly after. :)

I'll keep you updated.

AustinSchuh commented 7 years ago

@philwo Thanks for the update. Please let us know if you need help reproducing the issue. Have a good holiday!

philwo commented 7 years ago

@AustinSchuh Could you explain where you use XFS in your setup? I looked at the Vagrantfile and I don't see any mentioning of XFS there. Do you store the VirtualBox image on an XFS volume? Or is the HDD in the jessie base image formatted as XFS?

AustinSchuh commented 7 years ago

@philwo On my desktop, I've got a cryptfs with a lvm inside it with a xfs root filesystem and swap partition.

/dev/mapper/root-root--lv on / type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sda1 on /boot type ext3 (rw,relatime,data=ordered)

philwo commented 7 years ago

OK... so, to make sure I understand - how the stack looks like is this:

Your desktop has an NVMe SSD.
The SSD is partitioned into two partitions: sda1, sda2.
sda2 is formatted as "cryptfs" (Is that dm-crypt + LUKS?).
The "cryptfs" is formatted as an LVM physical volume.
The LVM physical volume is used in a volume group that contains root and swap.
root is formatted as XFS.
Vagrant creates a VirtualBox disk image that is saved on your root partition.
That VM image contains a partitioned disk with an ext4 (?) filesystem for root, supplied by the jessie base image.
On the ext4 filesystem of the VM you build the Robotics project.
And that exhibits the performance problems.

So we have Bazel running inside a VirtualBox VM with the output_base being ext4-on-Vbox-on-XFS-on-LVM-on-cryptfs-on-SSD? I think we need a simpler repro case. ^^;

I'll run some benchmarks with Bazel's output_base on an XFS formatted partition on my SSD to see if I spot any issues, but if not, I don't really know where to start digging here... :(

Can you reproduce this using e.g. just "bazel build" on your desktop, without the whole VM stack, somehow?

Also: If you disable sandboxing, how does the performance look like then?

AustinSchuh commented 7 years ago

Hi Phil,

I can see how you arrived at that conclusion from the bits and pieces I told you, but it is thankfully a bit simpler than that. I'm not running a VM on my desktop. The docs I pointed you to for the VM have a list of all the pieces that need to be installed, and were expedient to send to you.

NVMe SSD dm-crypt + LUKS Inside the cryptfs, I've got a LVM with 2 partitions, a root partition with XFS and a swap partition. Disabling swap has no effect.

$ lsblk
NAME                        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                           8:0    0 465.8G  0 disk 
├─sda1                        8:1    0   190M  0 part /boot
└─sda2                        8:2    0 465.6G  0 part 
nvme0n1                     259:0    0   477G  0 disk 
└─nvme0n1p1                 259:1    0   477G  0 part 
  ├─root-root--lv           252:0    0 400.4G  0 lvm  /
  ├─root-swap               252:1    0  65.2G  0 lvm

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz
Stepping:              2
CPU MHz:               1258.476
CPU max MHz:           3500.0000
CPU min MHz:           1200.0000
BogoMIPS:              5991.83
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              10240K
NUMA node0 CPU(s):     0-3,8-11
NUMA node1 CPU(s):     4-7,12-15

I'll try disabling sandboxing and get back to you. I seem to remember that our build no longer works without sandboxing enabled. (Update, I thought I'd have time between meetings, but that was a bit optimistic. I'll likely won't have time until Thursday afternoon to run the test...)

philwo commented 7 years ago

Haven't been able to run the build from your repository, but did a simple benchmark using Bazel itself for a start.

I can't spot any performance difference when Bazel builds itself, comparing ext4 vs. XFS and also sandboxing vs. standalone on my machine:

philwo@philwo:/$ uname -rvm
4.4.0-36-generic #55~14.04.1-Ubuntu SMP Fri Aug 12 11:49:30 UTC 2016 x86_64

philwo@philwo:/$ lsblk
NAME                                                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdb                                                     8:16   0 931.5G  0 disk  
└─sdb1                                                  8:17   0 931.5G  0 part  
  └─crypt (dm-2)                                      252:2    0 931.5G  0 crypt 
    ├─cryptvg-crypt1 (dm-3)                           252:3    0   100G  0 lvm   /mnt/ext4
    └─cryptvg-crypt2 (dm-4)                           252:4    0   100G  0 lvm   /mnt/xfs

philwo@philwo:/$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Stepping:              4
CPU MHz:               3431.777
BogoMIPS:              6983.66
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0-11

Bazel building itself with workspace and output_base on ext4-on-LVM-on-dmcrypt+luks-on-HDD:

philwo@philwo:/mnt/ext4/bazel (master)$ ./output/bazel --output_base=/mnt/ext4/output_base clean; ./output/bazel --output_base=/mnt/ext4/output_base build --worker_sandboxing //src:bazel

INFO: Elapsed time: 78.992s, Critical Path: 76.77s
INFO: Build completed successfully, 1121 total actions

Bazel building itself with workspace and output_base on xfs-on-LVM-on-dmcrypt+luks-on-HDD:

philwo@philwo:/mnt/xfs/bazel (master)$ ./output/bazel --output_base=/mnt/xfs/output_base clean; ./output/bazel --output_base=/mnt/xfs/output_base build --worker_sandboxing //src:bazel

INFO: Elapsed time: 78.797s, Critical Path: 76.74s
INFO: Build completed successfully, 1121 total actions

Same test, sandboxing disabled, ext4:

philwo@philwo:/mnt/ext4/bazel (master)$ ./output/bazel --output_base=/mnt/ext4/output_base clean; ./output/bazel --output_base=/mnt/ext4/output_base build --spawn_strategy=standalone //src:bazel

INFO: Elapsed time: 78.757s, Critical Path: 74.39s
INFO: Build completed successfully, 1121 total actions

Same test, sandboxing disabled, xfs:

philwo@philwo:/mnt/xfs/bazel (master)$ ./output/bazel --output_base=/mnt/xfs/output_base clean; ./output/bazel --output_base=/mnt/xfs/output_base build --spawn_strategy=standalone //src:bazel

INFO: Elapsed time: 79.817s, Critical Path: 68.20s
INFO: Build completed successfully, 1121 total actions

I verified via "ps axuf" that it actually uses the linux-sandbox binary for running the Spawns, too.

damienmg commented 7 years ago

Ok we decided to not block the release any further on this bug because it seems a very narrow corner case.

Can you provide us with a profile file so we can look if specific actions are causing the issue (use --profile=/some/path to create it)? Ideally one for a clean build with and without sandboxing. (do not send it over github, send it by mail).

philwo commented 7 years ago

I also tried a synthetic benchmark that Brian suggested some time ago to measure sandbox performance with many input files - interestingly it was much faster on XFS than on ext4:

# ext4
philwo@philwo:/mnt/ext4/bazel (master)$ ./output/bazel --output_base=/mnt/ext4/output_base clean; ./output/bazel --output_base=/mnt/ext4/output_base build //philwo:gen_main
INFO: Elapsed time: 27.743s, Critical Path: 26.01s
INFO: Build completed successfully, 104 total actions

# xfs
philwo@philwo:/mnt/xfs/bazel (master)$ ./output/bazel --output_base=/mnt/xfs/output_base clean; ./output/bazel --output_base=/mnt/xfs/output_base build //philwo:gen_main
INFO: Elapsed time: 11.321s, Critical Path: 9.22s
INFO: Build completed successfully, 104 total actions

Trying with Bazel 0.3.1, which still uses the old sandbox - now ext4 and xfs perform the same:

# ext4
philwo@philwo:/mnt/ext4/bazel (master)$ ~/bazel-0.3.1/bin/bazel --output_base=/mnt/ext4/output_base clean; ~/bazel-0.3.1/bin/bazel --output_base=/mnt/ext4/output_base build //philwo:gen_main
INFO: Elapsed time: 18.008s, Critical Path: 12.96s
INFO: Build completed successfully, 104 total actions

# xfs
philwo@philwo:/mnt/xfs/bazel (master)$ ~/bazel-0.3.1/bin/bazel --output_base=/mnt/xfs/output_base clean; ~/bazel-0.3.1/bin/bazel --output_base=/mnt/xfs/output_base build //philwo:gen_main
INFO: Elapsed time: 18.935s, Critical Path: 13.06s
INFO: Build completed successfully, 104 total actions

(Standalone strategy runs these in ~1s each, because it has no overhead for action inputs, so it doesn't actually do much in this benchmark.)

AustinSchuh commented 7 years ago

@philwo Thanks for taking a look! I'm poking our IT department to see if there's a way to get you access to a machine which I set up for you with the problem already verified to be present.

After a clean and a fetch, with sandboxing disabled:

austin[54576] aschuh-debian-z620 (master) ~/local/971-Robot-Code
$ time BAZEL_OVERRIDE=/home/austin/local/bazel/bazel-bin/src/bazel bazel test //y2016/control_loops/... --spawn_strategy=standalone
INFO: Elapsed time: 39.684s, Critical Path: 26.11s
INFO: Build completed successfully, 1417 total actions
//y2016/control_loops/shooter:shooter_lib_test                           PASSED in 0.2s
//y2016/control_loops/superstructure:superstructure_lib_test             PASSED in 7.7s

Executed 2 out of 2 tests: 2 tests pass.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.

real    0m39.796s
user    0m1.678s
sys     0m1.356s

Standalone is another 10x faster than the old sandbox. (wow, I had forgotten how fast builds could be...)

Profiles are now in your inbox.

philwo commented 7 years ago

I just pushed commit https://github.com/bazelbuild/bazel/commit/c5af2f3f2d974f7d0d84cecab6c57444b3413b01, which disables network sandboxing even for 'bazel test' (you have to explicitly enable it on tests that need this by adding a 'block-network' tag).

This should solve a major performance issue, which manifests itself exactly as you described it in your e-mail yesterday - "it seems like the build starts fast, but then becomes slower over time with each action taking more time". A link to the root cause in the Linux kernel is in the commit message.

Could you please try this out and see if it helps? If you're still seeing performance issues, there might be a second problem. Please let me know either way :)

AustinSchuh commented 7 years ago

Ok, I've got some very interesting and good news to report.

I used perf and strace to try to understand what was going on. It looks like the symlink creation step is taking a long time (50 ms/syscall?) which naturally will slow the whole thing down. It looks like this might be a dm-crypt + lvm + xfs problem, but I haven't been able to yet debug which piece is the problem.

I ran bonnie++ on a VM without the problem and on my problematic desktop.

root@ip-10-1-1-100:/x/home/admin/971-Robot-Code# bonnie++ -d /x/bonnie/ -s 125G -n 0 -m TEST -f -b -u admin
Using uid:1000, gid:1000.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
TEST           125G           120994   8 103249   9           145734   7 982.5 111
Latency                       48501us     254ms               250ms   31018us

1.97,1.97,TEST,1,1475918411,125G,,,,120994,8,103249,9,,,145734,7,982.5,111,,,,,,,,,,,,,,,,,,,48501us,254ms,,250ms,31018us,,,,,,

root[1248] aschuh-debian-z620 /home/brian
# bonnie++ -d /tmp/ -s 130G -n 0 -m TEST -f -b -u austin
Using uid:1000, gid:1000.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
TEST           130G           39864   5 35128   4           1350466  60 577.2 112
Latency                       11677ms   16050ms             35955us   97511us

1.97,1.97,TEST,1,1475921985,130G,,,,39864,5,35128,4,,,1350466,60,577.2,112,,,,,,,,,,,,,,,,,,,11677ms,16050ms,,35955us,97511us,,,,,,

So, yea, the problem machine (with a NVME!) is slow. I'm guessing random seeks, but I haven't yet had time to go back and debugged that problem.

So, to try to rectify that, I went into the ~/.cache/bazel/_bazel_austin/8f2824674f275a56b96be13cdc66e9a4 folder and deleted the bazel-sandbox folder. I then re-created it as a symlink to a folder in /dev/shm. My thinking is that symlink creation won't take much space, so /dev/shm is fine for that, and it is all temporary anyways, so a tmpfs is actually a good thing.

The end result is very very good and fixes the issue for me.

austin[54693] aschuh-debian-z620 (master) ~/local/971-Robot-Code
$ time BAZEL_OVERRIDE=/home/austin/local/bazel/bazel-bin/src/bazel bazel test -c opt //y2016/control_loops/...
real    0m47.826s
user    0m0.295s
sys     0m0.292s

I also saw speedups on the VM I was using as well that didn't have the issue.

This is probably almost trivial for one of you guys to implement. I can take a stab at implementing it as well. What do you think?

AustinSchuh commented 7 years ago

@philwo I spent a little bit of time looking at how to implement this today, and came up with the following plan. Please let me know what you think, ideally before I spend much more time on it.

I'm thinking that I'll make "/home/austin/.cache/bazel/_bazel_austin/8f2824674f275a56b96be13cdc66e9a4/bazel-sandbox/78afc106-c4c9-4a0c-897a-5122c6eca56e-615/execroot" a symlink to "/dev/shm/_bazel_austin/8f2824674f275a56b96be13cdc66e9a4/bazel-sandbox/78afc106-c4c9-4a0c-897a-5122c6eca56e-615/execroot" That'll put the symlink tree in /dev/shm where it will be fast. The symlinks are small, so this isn't a big deal in terms of memory usage.

The problem is that, for example, the output will be put in "/dev/shm/_bazel_austin/8f2824674f275a56b96be13cdc66e9a4/bazel-sandbox/78afc106-c4c9-4a0c-897a-5122c6eca56e-615/execroot/971-Robot-Code/bazel-out/host/bin/external/slycot_repo/slycot/src/DF01MD.pic.o", and the output could be big. To fix that, I'm planning on creating folders in "/home/austin/.cache/bazel/_bazel_austin/8f2824674f275a56b96be13cdc66e9a4/bazel-sandbox/78afc106-c4c9-4a0c-897a-5122c6eca56e-615/bazel-out" for the output. (I need to look up what happens with testlogs, but it's the same problem.) I'll then symlink to those from the tree in /dev/shm.

I think that'll give me a nice balance of fewer syscalls on the slower drive without using a bunch of memory.

Thoughts? If I don't hear anything from you, I'm going to try to implement it and send you a review for it.

damienmg commented 7 years ago

Sounds like a reasonable thing to do, as long as it is not the default behavior.

ahyangyi commented 7 years ago

I am experience a very similar slowdown in Bazel 0.3.2: a lone java process, then a few compiler process, with scary load averages, indicating that syscalls are jamming the system.

However, my filesystem setup is a bit different: I use ext4 on top of a hardware RAID composed of mechanical disks.

I can confirm that turning off the sandbox fixes the problem. Also, iotop shows crazy amounts of disk write when sandboxing is on.

If needed I can do some additional profiling.

philwo commented 7 years ago

@ahyangyi That would be awesome and very helpful. Thank you!

@AustinSchuh I think that sounds like a plan - however note that intermediate outputs can be inputs for another action, so the "bazel-out" folder of a Spawn might already contain files (or rather, symlinks to existing files in the real execroot/bazel-out folder).

At this time, I have no idea why the new sandbox would produce more disk I/O than the old one. It shouldn't matter that much whether we create a symlink (new behavior) or a hardlink and then bind-mount on top of that (old behavior)!

AustinSchuh commented 7 years ago

@philwo That's good to know. Would it potentially contain files in the bazel/*/bin folder? If it wouldn't, I could then symlink only those bin folders, otherwise I'm thinking handling that case may be a premature optimization. It sounds like a much less likely case, and won't have thousands of files from sandboxed compilers like the other folder.

Do you have a flag name you would like me to use for the flag to enable this behavior?

AustinSchuh commented 7 years ago

@ahyangyi What happens if you symlink the bazel-sandbox folder I mentioned to /dev/shm/? Does that fix your performance issue? You can test that by starting a build, killing it, going to the correct folder in the .cache/bazel directory, and replacing the folder with a symlink. I'm curious to see if what fixes it for me also fixes it for you.

philwo commented 7 years ago

@AustinSchuh Sorry, I'm not sure I understand what you mean with "the bazel/*/bin folder" (but it's late here already, I might just be missing something) - I think you can easily find out what works in practice by just giving your idea a try and running a bigger build. If any files are not where they're supposed to be, the actions will immediately fail, so there's a good feedback loop while coding that feature. You could temporarily give your new flag the default value "true" to turn it on by default and then "bazel test" the //src/test/shell/bazel:bazel_sandboxing_test, which should also test some edge cases.

Regarding the flag name, anything that starts with "experimental_" should be fine - maybe "experimental_sandbox_on_tmpfs"? But feel free to pick what you like. If you have any code you'd like to get feedback on, feel free to send it over :)

AustinSchuh commented 7 years ago

@philwo I'm currently wrestling with how to get the _bazel_austin/dd758a6cc6943296d5c6a65e2c252c63 part of the cache path reliably. I've traced the calculation all the way into the C++ code, which makes it hard to calculate in a single spot in Java. I'm tempted to use the Path class to remove the /home/austin/.cache/bazel/ leading string to get the folder name, but that feels wrong. If you have hints for me before I get started tomorrow, that would be awesome. Otherwise I'll implement something and you can review it. Thanks!

I'm going to implement the simple solution where bazel-out points back to the main disk, and ignore the cases where files get symlinked into the bazel-out folder. If that proves to be a performance problem, I'll think about how to make things more complicated from there.

ahyangyi commented 7 years ago

@AustinSchuh Symlinking that .cache directory to the shared memory fixed performance issue for me.

That said, I cannot let everyone use /dev/shm and assume the problem does not exist. Some of them needs to be able to build things on weaker platforms.

Also, the current implementation of the seemingly random path name takes both the Bazel version (which is not just a version string, but a checksum of the whole Bazel installation) and the workspace path into account. If you want to implement some alternatives, please try to preserve this behavior.

Currently the risk of mixing different Bazel implementation is minimalized. Please keep that.

ahyangyi commented 7 years ago

@philwo What kind of profiling would help you the most?

damienmg commented 7 years ago

@AustinSchuh are you speaking of taking the output base: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/analysis/BlazeDirectories.java#L135 ?

@ahyangyi:

Also, the current implementation of the seemingly random path name takes both the Bazel version (which is not just a version string, but a checksum of the whole Bazel installation) and the workspace path into account. If you want to implement some alternatives, please try to preserve this behavior.

What? The execroot path does not use the install name (though the install base does).

lberki commented 7 years ago

@ahyangyi : The checksum in the output base is just the MD5 checksum of the name of the directory the WORKSPACE file is in. To compute:

cd <wherever your WORKSPACE file is>; echo -n $PWD | md5sum

And it's usually under $HOME/.cache/bazel .

You can change it by passing the --output_base startup option to Bazel, e.g.

bazel --output_base=/dev/shm/myworkspace build //:target

Alternatively, you can use the --output_user_root startup option to change where the MD5 checksum directory is put:

bazel --output_user_root=/dev/shm/bazel build //:target

philwo commented 7 years ago

Yep, it is not guaranteed that the output base is in a _bazel_username/hash directory, because it can be overridden by --output_base (which is not unusual, for example I often use a separate output base for the Bazel version that I'm currently debugging, so that rebuilding it using released Bazel doesn't trigger a server restart all the time):

philwo@philwo:~/bazel-test (master)$ ./output/bazel info
bazel-bin: /home/philwo/.cache/bazel/_bazel_philwo/11ae71351ba9bcc6fe0988005a8d1396/execroot/bazel-test/bazel-out/local-fastbuild/bin
execution_root: /home/philwo/.cache/bazel/_bazel_philwo/11ae71351ba9bcc6fe0988005a8d1396/execroot/bazel-test
output_base: /home/philwo/.cache/bazel/_bazel_philwo/11ae71351ba9bcc6fe0988005a8d1396
output_path: /home/philwo/.cache/bazel/_bazel_philwo/11ae71351ba9bcc6fe0988005a8d1396/execroot/bazel-test/bazel-out

philwo@philwo:~/bazel-test (master)$ ./output/bazel --output_base=/tmp/something info
bazel-bin: /tmp/something/execroot/bazel-test/bazel-out/local-fastbuild/bin
execution_root: /tmp/something/execroot/bazel-test
output_base: /tmp/something
output_path: /tmp/something/execroot/bazel-test/bazel-out

AustinSchuh commented 7 years ago

@philwo @damienmg @lberki Thanks for the extra info about output_base! Sounds like the safest place to symlink to would be something much more braindead like /dev/shm/$(echo "${OUTPUT_BASE}" | md5sum)/. Feedback on that proposal is appreciated :)

@ahyangyi I'm only going to put the symlink tree in /dev/shm. All outputs and temporaries will end up back on the main disk through more symlinks. For me and my machine, this means that I expect to see 1 folder with 20 folders inside it, each with about 2000 symlinks and no files. I don't know how to be lighter weight than that without pushing the issue back onto the user to get a faster machine. My single symlink performance question was only to sleuth out if the same hack that fixed it for me fixes it for you to try to figure out if our root causes are the same. It also won't be enabled by default.

philwo commented 7 years ago

@AustinSchuh Do we actually need a reproducible mapping of the output base directory to some directory in /dev/shm? Or could we just use a unique directory for the duration of a build? All the sandbox directories there should be deleted anyway when a build finishes, so we could just use a new one per build.

A good way to generate a secure, random temporary directory in /dev/shm might be http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#createTempDirectory(java.nio.file.Path,%20java.lang.String,%20java.nio.file.attribute.FileAttribute...) - we might have some internal version of that in the our FilesystemUtils / FileSystemUtils classes, but I'm not sure right now.

What do you think?

AustinSchuh commented 7 years ago

@philwo I kind of like being able to go look at the output to debug it, and a reproducible map helps with that. It wasn't very hard to implement, so I went ahead and did it. I can change things around if you want.

https://bazel-review.googlesource.com/#/c/6670/

I've got a couple of dirty hacks in there right now (The worst one is in SymlinkedExecRoot) that I'd appreciate feedback on and I need an enable/disable flag.

But, it works! It is slightly slower than leaving bazel-out in /dev/shm, but less RAM usage is a good thing.

ahyangyi commented 7 years ago

@lberki Thanks for clarification. I guess my wrong impression came from that I often switched Bazel versions and saw multiple MD5s in my ~/.cache/bazel and mistakenly correlated the two things.

@AustinSchuh Yes. It seems that we are indeed encountering the same problem even if our file system setups are different, which somewhat justifies my presence in this issue instead of me creating another one.

I'll try your solution and see if it also works for me.

ahyangyi commented 7 years ago

A simple note: 0.4.0 and 0.4.1 are also affected by this problem. I am unable to test 0.4.2 and 0.4.3 due to being unable to build JDK7 version of (my modified fork of) bazel, which I just raised as issue #2297.

I am able to build the JDK8 version, but all machines that I have access to and can run JDK8 have no performance issues anyway.

Note that this correlation does not mean JDK7 is the problem: for example, the lack of a JDK8 package could imply an older kernel, or an outdated userspace library.

AustinSchuh commented 7 years ago

@ahyangyi Phil is currently working on a patch to fix this. He got it working with overlayfs, but that's only supported on ubuntu. He's trying again in light of that discovery. He's on vacation until the end of January, so don't expect any motion until after he gets back.

We (Peloton Technology) are running with the patch I put together cherry-picked into our release until a suitable upstream replacement exists.

AustinSchuh commented 7 years ago

@philwo Any update? I'm still finding master slow, and about to forward port my patch again. It would be nice to get something upstreamed :)

abergmeier-dsfishlabs commented 7 years ago

As someone who sees more than a 100x slowdown with sandboxing (have >25000 input files), I appreciate any work on that. That patch seems a bit invasive to me. I do support the general idea, though. Is there a reason why to hardcode shm? Personally I always try to use tmpfs, since I have not a single machine that would have sufficient memory to do everything in RAM.

philwo commented 7 years ago

Does the --experimental_sandbox_base flag work for you? (Not in a released Bazel version yet.)

You can use it like this to put all sandboxes on /run/shm: bazel build --experimental_sandbox_base=/run/shm //my:target

It should give the same (or better) performance improvement than the more complex solution, but has the disadvantage of putting your entire sandbox on tmpfs when used like that, so you need enough RAM to hold all output artifacts that are in-flight (they will be moved to the "real" bazel-out on disk after each action execution).

AustinSchuh commented 7 years ago

@philwo I didn't know that existed yet. That works for me :) I'll backport it to 0.4.5 for our use, and we can then drop the patch going forwards.

abergmeier-dsfishlabs commented 7 years ago

I'll backport it to 0.4.5

Good point. @damienmg when can we expect 0.5?

damienmg commented 7 years ago

Release candidate today if everything goes alright.

On Thu, May 4, 2017, 9:19 AM Andreas Bergmeier notifications@github.com wrote:

I'll backport it to 0.4.5

Good point. @damienmg https://github.com/damienmg when can we expect 0.5?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/1836#issuecomment-299113268, or mute the thread https://github.com/notifications/unsubscribe-auth/ADjHf-T7vLZRszBu02Pbo1AEXQQOCNvPks5r2Xv3gaJpZM4KHMWj .

AustinSchuh commented 6 years ago

--experimental_sandbox_base has been working for us since May. Should we close this bug?

(Note, mapping the bazel-bin and bazel-genfiles folders to /dev/shm instead of the host filesystem has been resulting in higher memory usage for us when building 1G+ deploy images, but that's part of the price to pay for being fast.)

philwo commented 6 years ago

Glad to hear its working for you, Austin :)

I'm fine with closing this bug, unless someone else who chimed in on this issue still wants work to be done?

bazelbuild / bazel

Sandboxing slow #1836