Open zv-io opened 3 months ago
Thanks for your report! These symptoms are consistent with running out of virtual memory space. Scheduler threads eat up a considerable amount of that (double the number to account for their dirty sibling), and many test cases also scale according to the number of schedulers.
If the symptoms lessen if you scale down the number of dirty schedulers (+SDcpu 1
off the top of my head), I’d chalk it up to running out of virtual memory.
I wonder if it only happens on Elixir test suite because Elixir runs tests in parallel by default and it uses the number of cores multiplied by 2. So it will try to run 240 tests at once. This will reveal race conditions in the suite that didn't show up otherwise (like two tests trying to define the same module, which is a bug in the test suite). But this will also be problematic for tests that shell out. Some of our tests spawns new Erlang VM processes too, which would explain running out of virtual memory (especially if you are spawning dozens of then in parallel). @zv-io you can try TRACE=1 make test
. That should disable concurrent testing. The module conflict I will fix on Elixir.
@josevalim I amend my previous statement about Elixir passing tests. We had been running a smaller subset of tests for convenience, and these do show errors at higher thread counts but weren't aborting the test script on failure:
export ERL_TOP=$builddir
make release_tests
for _header in erl_fixed_size_int_types.h \
${CHOST}/erl_int_sizes_config.h; do
cp erts/include/$_header erts/emulator/beam/
done
cd release/tests/test_server
$ERL_TOP/bin/erl -s ts install -s ts smoke_test batch -s init stop
With TRACE=1 make test
(N=192 threads enabled on the machine, but no -j
provided to make
) the visible output appears to be single-threaded, however all 192 threads are busy doing something ("running") -- see how high the load is:
adelie # uptime
22:44:38 up 2:38, 2 users, load average: 824.14, 792.87, 767.28
Is this the expected behavior on such a system?
Here are the full logs from TRACE=1 make test
, all the same hardware, same binaries, etc.:
N=192: erlang-ppc-musl-192t.log -- I killed it after 6+ hours when it reached Testing tests.emulator_test: *** FAILED test case 1710 of 2194 ***
, and I had to killall -9 epmd; killall -9 beam.smp
as those processes kept going...
N=96: erlang-ppc-musl-96t.zip -- I killed it after 6+ hours, and the log file is much larger with more/different errors.
N=64: (pending, will update shortly)
I am happy to provide access to this hardware (a spare test machine set up to debug this issue) if that would be helpful.
@zv-io can you reach me by email on my GitHub profile? Access to the machine would help find out what is going on. Although I will probably need help setting it all up. Thank you!
High CPU usage on systems with many schedulers is an issue we've encountered and are actively investigating - there seem to be some inefficiencies in the scheduling and task stealing algorithms that become visible with 100+ schedulers. This is on 64bit systems, but I would imagine some of that would translate to 32bit systems as well. That said, we don't see any hanging or crashes - just elevated CPU usage.
cc @RobinMorisset
As @jhogberg already said. You are very likely running out of virtual address space.
When I'm starting a 32-bit runtime system (which does nothing besides starting) with 160 schedulers and 160 dirty CPU schedulers on my linux machine, it will map about 750 MB of virtual memory which is quite a lot of the maximum amount of 4 GB available to a 32-bit VM (if using 8 schedulers and 8 dirty CPU schedulers it will map about 100 MB of virtual memory). The high amount of virtual memory mapped comes both from thread stacks (a normal scheduler will reserve about 1/2 MB by default), but also from the memory mappings made by ERTS internal scheduler specific allocator instances. The ERTS internal allocators can be disabled, but that will likely cost you much when it comes to performance/scalability and will perhaps not even reduce memory usage if you truly are using all 160 schedulers.
When using this many schedulers, you typically don't want to use the 32-bit runtime system, but instead use the 64-bit runtime system which does not suffer from the very limited amount of virtual address space available to the 32-bit runtime system. It is possible to shrink stack sizes on schedulers and other threads, and disable ERTS internal allocators to get the system to survive a bit longer, but I'd say that you are just shooting yourself in the foot. I see only two real options; either use the 64-bit runtime system, or reduce the memory usage of your application.
When looking at the erl_crash-calls.dump
in the https://git.adelielinux.org/adelie/packages/-/issues/1194 issue that you pointed to, one can see that the runtime system has allocated this much memory:
total: 3380754648
processes: 10167892
processes_used: 10166320
system: 3370586756
atom: 1056953
atom_used: 1039189
binary: 339784
code: 13783237
ets: 620248
I'd say that it is more or less amazing that it was able to allocate and use 3.15 GB of memory with 160 schedulers and 160 dirty CPU schedulers in a 32-bit runtime system. Note that the 3.15 GB are dynamically allocated memory provided by the allocators, so they will need to map even more address space than that in order to be able to provide those allocations. This also do not include stacks of threads, dynamically linked libraries, and code of the runtime system.
While quickly hitting the 4GB barrier with this many schedulers is hardly unusual, I find it a bit odd that system
allocations dwarf the rest to such a degree. I think it's worth investigating further.
@jhogberg thanks for your support here; do you need access to a test machine?
That would be much appreciated, however, I most likely won't be able to look at this until mid-September due to other commitments. :-|
That would be much appreciated, however, I most likely won't be able to look at this until mid-September due to other commitments. :-|
@jhogberg could I poke you about this please? Not urgent but if you do have some spare time to look at this we'd be grateful.
Sure, please send the details to my e-mail, you'll find it in the commit log.
Describe the bug
On 32-bit systems with high thread counts (N\~120), transient instability in the Elixir test suite surfaces. At even higher thread counts (N\~160), fatal errors are reliably reproduced.
Transient instability at N\~120 includes assertion failures and timeouts, e.g.:
and
however at N\~160, fatal errors are reliably reproduced:
at N\~192, the OTP test suite shows some transient (not always reproducible) errors:
We have a couple of systems with high thread counts, namely 20- and 24- core SMT=8 ppc64 systems. We can vary the number of active cores/threads on these machines on demand, from 1-192 threads. For example:
During recent testing in our 32-bit PowerPC environments (this does not affect 64-bit environments), we discovered significant instability (transient test failures) at N\~120 as well as reliably repeatable fatal errors at N\~160, that are mitigated when lower thread configurations are used. Our highest thread x86_64 machine (running an i586 environment) caps out at 72 and is stable.
All tests are run with the exact same binaries and hardware; the only thing that changes is how many cores/threads are active.
Some examples:
To Reproduce
Expected behavior
The workload is handled successfully.
Affected versions
OTP 27.
Additional context
Additional remarks