Open tobiasgrosser opened 6 months ago
I don't see evidence that it's not doing buffered writes in that log. AFAIK it is buffering writes, which is why the writes there have sizes like 4096 or 8192.
As for redundant file ops, this seems likely, although (1) I'm not sure how much control rust actually gives over doing those operations more economically without significantly complicating things and (2) I'd like to confirm that these extra stat calls are actually causing the problem. Do you know if tar
performs significantly better in this situation than leantar?
Note that there is some known slowness, especially on Windows, which is due mainly to the way Windows handles file system operations and is equally bad in all programs that create many files. Two minutes is not out of the realm of possibility AFAIK. See "NTFS really isn't that bad" - Robert Collins (LCA 2020).
I made some modifications in bf72a45 which should decrease the number of calls to mkdir
and stat
on directories that already exist. On my machine the strace looks like this now:
openat(AT_FDCWD, "/home/mario/.cache/mathlib/01ab41acd110bb6b.ltar", O_RDONLY|O_CLOEXEC) = 3
read(3, "LTAR\375~\316\3767R\332\34./.lake/build/lib/Ma"..., 8192) = 2317
openat(AT_FDCWD, "././.lake/build/lib/Mathlib/Data/Rbtree/Insert.trace", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
mkdir("././.lake/build/lib/Mathlib/Data/Rbtree", 0777) = -1 ENOENT (No such file or directory)
mkdir("././.lake/build/lib/Mathlib/Data", 0777) = -1 ENOENT (No such file or directory)
mkdir("././.lake/build/lib/Mathlib", 0777) = -1 ENOENT (No such file or directory)
mkdir("././.lake/build/lib", 0777) = -1 ENOENT (No such file or directory)
mkdir("././.lake/build", 0777) = -1 ENOENT (No such file or directory)
mkdir("././.lake", 0777) = 0
mkdir("././.lake/build", 0777) = 0
mkdir("././.lake/build/lib", 0777) = 0
mkdir("././.lake/build/lib/Mathlib", 0777) = 0
mkdir("././.lake/build/lib/Mathlib/Data", 0777) = 0
mkdir("././.lake/build/lib/Mathlib/Data/Rbtree", 0777) = 0
openat(AT_FDCWD, "././.lake/build/lib/Mathlib/Data/Rbtree/Insert.trace", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
write(4, "2079064578436529917", 19) = 19
close(4) = 0
mmap(NULL, 487424, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x787001e7b000
brk(0x5bb5cbbda000) = 0x5bb5cbbda000
openat(AT_FDCWD, "././.lake/build/lib/Mathlib/Data/Rbtree/Insert.olean", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
mmap(NULL, 8785920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x786ffb79f000
munmap(0x786ffb79f000, 8785920) = 0
write(4, "olean\1dcccfb73cb247e9478220375ab"..., 15112) = 15112
close(4) = 0
openat(AT_FDCWD, "././.lake/build/lib/Mathlib/Data/Rbtree/Insert.olean.hash", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
write(4, "5129390526386888840", 19) = 19
close(4) = 0
brk(0x5bb5cbbfe000) = 0x5bb5cbbfe000
openat(AT_FDCWD, "././.lake/build/lib/Mathlib/Data/Rbtree/Insert.ilean", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
brk(0x5bb5cc45e000) = 0x5bb5cc45e000
write(4, "{\"version\":3,\"references\":{},\"mo"..., 67) = 67
close(4) = 0
openat(AT_FDCWD, "././.lake/build/lib/Mathlib/Data/Rbtree/Insert.ilean.hash", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
write(4, "14815318170663550241", 20) = 20
close(4) = 0
openat(AT_FDCWD, "././.lake/build/lib/Mathlib/Data/Rbtree/Insert.log.json", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
write(4, "[{\"message\":\n \".> LEAN_PATH=./."..., 725) = 725
close(4) = 0
mkdir("././.lake/build/ir/Mathlib/Data/Rbtree", 0777) = -1 ENOENT (No such file or directory)
mkdir("././.lake/build/ir/Mathlib/Data", 0777) = -1 ENOENT (No such file or directory)
mkdir("././.lake/build/ir/Mathlib", 0777) = -1 ENOENT (No such file or directory)
mkdir("././.lake/build/ir", 0777) = 0
mkdir("././.lake/build/ir/Mathlib", 0777) = 0
mkdir("././.lake/build/ir/Mathlib/Data", 0777) = 0
mkdir("././.lake/build/ir/Mathlib/Data/Rbtree", 0777) = 0
openat(AT_FDCWD, "././.lake/build/ir/Mathlib/Data/Rbtree/Insert.c", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
write(4, "// Lean compiler output\n// Modul"..., 1209) = 1209
close(4) = 0
openat(AT_FDCWD, "././.lake/build/ir/Mathlib/Data/Rbtree/Insert.c.hash", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
write(4, "16678743707097875166", 20) = 20
close(4) = 0
read(3, "", 8192) = 0
munmap(0x787001e7b000, 487424) = 0
close(3) = 0
Note also that all of the (small) files here are written using single large writes, so I don't see how this can be improved unless you think another syscall besides write
would perform better.
leantar seems to be not very efficient in its filesystem usage, which slows it down when running on an NFS mount. Instead of 10 seconds, a mathlib
lake exe cache get
takes over two minutes. Some major inefficiencies seem to be:Here an strace snippet.