deplinenoise / tundra

Tundra is a code build system that tries to be accurate and fast for incremental builds
MIT License
438 stars 74 forks source link

crash with too many threads #346

Closed questor closed 1 month ago

questor commented 1 month ago

after upgrading my linux mint to the newest version with ubuntu 24.04 as base for unknown reasons tundra thinks I have a machine with 128 cores and crashes. I see that the max threads is limited to 64 and right afterwards it crashes. when taking debugging-logs I can see that 128 threads are reported (dunno why, I'm using a virtual machine with 2 or 4 cores configured), 128 threads are created and then it crashes.

manually setting number of threads to a low number fixes the issue, but using more than 64threads will crash tundra after it clamped the number of threads to 64... so it's easy reproducable by setting the threadnumber to 65 and let tundra run.

the complete log is here:

[D] digest cache initialized -- 0 entries
[D] .tundra2.dag: successfully mapped at 0x74ac3765e000 (756736 bytes)
[D] checking file signatures for DAG data
[D] DAG signatures match - using existing data w/o Lua invocation
[D] .tundra2.state: successfully mapped at 0x74ac37acd000 (33920 bytes)
[D] .tundra2.scancache: successfully mapped at 0x74ac3764b000 (74384 bytes)
[D] Scan cache initialized from frozen data - 475 entries
[D] Node selection finished with 5 nodes to build
[D] Node remap: 657 src nodes, 217 active nodes, using 13888 bytes of node state buffer space
[D] Max # expensive jobs: 128
[W] too many build threads (128) - clamping to 64
[D] build queue initialized; ring buffer capacity = 256
[D] starting build thread 1
[D] starting build thread 2
[D] starting build thread 3
[D] starting build thread 4
[D] starting build thread 5
[D] starting build thread 6
[D] starting build thread 7
[D] starting build thread 8
[D] starting build thread 9
[D] starting build thread 10
[D] starting build thread 11
[D] starting build thread 12
[D] starting build thread 13
[D] starting build thread 14
[D] starting build thread 15
[D] starting build thread 16
[D] starting build thread 17
[D] starting build thread 18
[D] starting build thread 19
[D] starting build thread 20
[D] starting build thread 21
[D] starting build thread 22
[D] starting build thread 23
[D] starting build thread 24
[D] starting build thread 25
[D] starting build thread 26
[D] starting build thread 27
[D] starting build thread 28
[D] starting build thread 29
[D] starting build thread 30
[D] starting build thread 31
[D] starting build thread 32
[D] starting build thread 33
[D] starting build thread 34
[D] starting build thread 35
[D] starting build thread 36
[D] starting build thread 37
[D] starting build thread 38
[D] starting build thread 39
[D] starting build thread 40
[D] starting build thread 41
[D] starting build thread 42
[D] starting build thread 43
[D] starting build thread 44
[D] starting build thread 45
[D] starting build thread 46
[D] starting build thread 47
[D] starting build thread 48
[D] starting build thread 49
[D] starting build thread 50
[D] starting build thread 51
[D] starting build thread 52
[D] starting build thread 53
[D] starting build thread 54
[D] starting build thread 55
[D] starting build thread 56
[D] starting build thread 57
[D] starting build thread 58
[D] starting build thread 59
[D] starting build thread 60
[D] starting build thread 61
[D] starting build thread 62
[D] starting build thread 63
[D] starting build thread 64
[D] starting build thread 65
[D] starting build thread 66
[D] starting build thread 67
[D] starting build thread 68
[D] starting build thread 69
[D] starting build thread 70
[D] starting build thread 71
[D] starting build thread 72
[D] starting build thread 73
[D] starting build thread 74
[D] starting build thread 75
[D] starting build thread 76
[D] starting build thread 77
[D] starting build thread 78
[D] starting build thread 79
[D] starting build thread 80
[D] starting build thread 81
[D] starting build thread 82
[D] starting build thread 83
[D] starting build thread 84
[D] starting build thread 85
[D] starting build thread 86
[D] starting build thread 87
[D] starting build thread 88
[D] starting build thread 89
[D] starting build thread 90
[D] starting build thread 91
[D] starting build thread 92
[D] starting build thread 93
[D] starting build thread 94
[D] starting build thread 95
[D] starting build thread 96
[D] starting build thread 97
[D] starting build thread 98
[D] starting build thread 99
[D] starting build thread 100
[D] starting build thread 101
[D] starting build thread 102
[D] starting build thread 103
[D] starting build thread 104
[D] starting build thread 105
[D] starting build thread 106
[D] starting build thread 107
[D] starting build thread 108
[D] starting build thread 109
[D] starting build thread 110
[D] starting build thread 111
[D] starting build thread 112
[D] starting build thread 113
[D] starting build thread 114
[D] starting build thread 115
[D] starting build thread 116
[D] starting build thread 117
[D] starting build thread 118
[D] starting build thread 119
[D] starting build thread 120
[D] starting build thread 121
[D] starting build thread 122
[D] starting build thread 123
[D] starting build thread 124
[D] starting build thread 125
[D] starting build thread 126
[D] starting build thread 127
[I] begin pass Compile generator (nodes: 0 - 1214252735 (1214252736))
tundra2 -D -w terminated by signal SIGSEGV (Address boundary error)

and the call-stack: t2::BuildQueueBuildNodeRange(t2::BuildQueue, int, int, int) t2::DriverBuild(t2::Driver) ?? ??

questor commented 1 month ago

update: after updating my codebase the crash is gone, I think it's detecting the number of threads correct now. but still, if you start tundra with "-j 65" it crashes, so I will keep this ticket open, even when it's not happening for me by default anymore.

questor commented 1 month ago

hm, no idea what is going on, here https://github.com/deplinenoise/tundra/blob/e48e03bbd8a193889337d25b42cfbd64c64c1c33/src/BuildQueue.cpp#L742 the number of threads is clamped, but still (as seen in the logs) more threads are created than the clamped number (and it will crash because there is not enough space for all the threads and memory will be overwritten). I've compiled tundra with clang, is it an error in the alias analysis of the compiler and with gcc it's working?

deplinenoise commented 1 month ago

No idea, but I'll try to repro this

deplinenoise commented 1 month ago

At first glance the code looks correct to me

deplinenoise commented 1 month ago

Yeah I can sort of repro on this random Ubuntu machine I have, compiling with gcc. I don't create more threads than 64 even if I specify more on the command line, but it does crash with a stack smash protection when it happens:

[D] joining with build thread 57
[D] joining with build thread 58
[D] joining with build thread 59
[D] joining with build thread 60
[D] joining with build thread 61
[D] joining with build thread 62
[D] joining with build thread 63
*** stack smashing detected ***: terminated
Aborted (core dumped)

This doesn't happen with -j 64 so something is up.