DynamoRIO / drmemory

Memory Debugger for Windows, Linux, Mac, and Android
Other
2.43k stars 262 forks source link

drmemory 2.6.19984/cronbuild on Win10/Workstation on IBM 7945/XEON crashes with ASSERT FAILURE #2514

Open gisburn opened 4 weeks ago

gisburn commented 4 weeks ago

Describe the bug drmemory crashes with the following assertion: ---- snip ----

$ drmemory -debug -- myhorribleapp.exe
~~Dr.M~~ Dr. Memory version 2.6.19984
~~Dr.M~~ Running "C:\myhorribleapp.exe"
~~Dr.M~~ Using system call file C:\Users\roland_mainz\AppData\Roaming\Dr. Memory\symcache\syscalls_x64.txt
~~Dr.M~~ ERROR: Failed to find "main" for limiting memory dump
ASSERT FAILURE
D:\a\drmemory\drmemory\ext\drx\scatter_gather_x86.c:122: proc_avx_enabled() (Scatter/gather instrs not available)

---- snip ----

To Reproduce Steps to reproduce the behavior:

  1. Get a IBM 7945/XEON machine
  2. Install Win10/Workstation on it
  3. $ bcdedit.exe /set nx AlwaysOff ## (just to be sure)
  4. Run any app with $ drmemory -debug -- myhorribleapp.exe

Expected behaviour drmemory works

** Actual behaviour: Crash

Versions

gisburn commented 4 weeks ago

The ASSERT which gets triggered is https://github.com/DynamoRIO/dynamorio/blob/7f7544b6f593dcdbb2213fca3922055092b35af1/ext/drx/scatter_gather_x86.c#L122 - it seems it wants to use AVX on a machine which has no AVX.

@derekbruening Is AVX now mandatory for DrMemory on Windows 10/64bit AMD64 ?

derekbruening commented 3 weeks ago

The default builds are whatever the compiler defaults are on the Github Actions VMs: which are generally as old as Github supports (VS2019, Ubuntu20); I don't think it's reasonable to ask the small set of existing maintainers to support more than that as maintaining the existing multiple platforms is challenging enough. This is an open-source project so you could certainly do a custom build with flags telling the compiler not to use AVX. You could submit a pull request adding CMake options for that and a regression test.

cedricblancher commented 3 weeks ago

The default builds are whatever the compiler defaults are on the Github Actions VMs: which are generally as old as Github supports (VS2019, Ubuntu20);

Just my 0,03 francs:

  1. Most universities and scientific institutes get their servers replaced every 8-10 years. So far ALL of our XEON machines do NOT support the avx extension, and still have a lifetime of 6 years left. It is a instruction set extension after all!!

  2. Most industry computers, e.g. SIMATIC PCs, use CPUs which have long-term availability (>20 years) by the vendor (Intel). As far I have tested this morning none of the current and up to date SIMATIC and Rockwell PC lineup supports the avx extension. Not being able to run drmemory on such systems is a BIG DEAL, as this is industry which actually throws money at stuff. But maybe not in this case, as valgrind is running perfectly fine on such machines.

  3. Misconception: @gisburn is wrong, you don't have to compile drmemory without avx support. Just drmemory should not make avx support fatally mandatory if no avx extension support is available in the cpu. No avx support in the cpu means it is not needed to expect avx instructions and bail out with an error for that. if(has_avx_support()) {...} related code, and the bug is gone

derekbruening commented 3 weeks ago

Misconception: @gisburn is wrong, you don't have to compile drmemory without avx support. Just drmemory should not make avx support fatally mandatory if no avx extension support is available in the cpu. No avx support in the cpu means it is not needed to expect avx instructions and bail out with an error for that. if(has_avx_support()) {...} related code, and the bug is gone

It sounds like you know how to fix it and could send a pull request?

If the compilers are not emitting AVX then it should indeed work. Looking at that drx code: looks like an oversight with no deliberate intention; looks like it has just #ifdef PLATFORM_SUPPORTS_SCATTER_GATHER guarding the scatter-gather setup when it should also have the proc_avx_enabled() for x86 (and SVE check for AArch64). This code is from December 2021 DynamoRIO/dynamorio#5252 so it sounds like non-AVX usage is actually pretty rare for any DynamoRIO-based tool as no one had reported any issue in the last 3 years. Pinging the author to raise awareness: @abhinav92003

This would affect any DynamoRIO tool using the drx library, so I would suggest filing a companion bug in the DynamoRIO tracker. It is best for you to file it to A) show this is a real issue hit in the real world; B) make it easier for you to confirm it solves issues on your machines; C) you will get updates to that issue.

The most important and maybe most difficult part is: how to add a regression test to prevent accidental breakage of non-AVX in the future? Ideally it would be run on a machine that actually has no AVX: a donated self-hosted Github Actions runner from someone with one of these machines? Just a build and running a couple of tests would be enough. Halfway solutions would be flags to pretend cpuid returns no AVX (only tests certain paths); run DR under itself to look for AVX opcodes (a la drcpusim; but again only tests certain aspects such as compiler emitting AVX, though it can also fake cpuid).

derekbruening commented 3 weeks ago

I have not tested it on a non-AVX machine, but that code in question looks like it will run just fine in release build: so this is a DR-debug-build-only issue, is that right? That may explain why no one else has hit it. That also means there is a workaround when using DR debug build: -ignore_assert_list "*", or -dr_ops "-ignore_assert_list *" from drmemory.

abhinav92003 commented 3 weeks ago

The use of PLATFORM_SUPPORTS_SCATTER_GATHER without proc_avx_enabled to guard the scatter-gather setup existed even before https://github.com/DynamoRIO/dynamorio/pull/5252 (see https://github.com/DynamoRIO/dynamorio/pull/3955 where PLATFORM_SUPPORTS_SCATTER_GATHER was introduced). But looks like https://github.com/DynamoRIO/dynamorio/pull/5252 added init- and exit-time logic (the get_mov_scratch_mm_opcode_and_size calls in drx_scatter_gather_thread_init and drx_scatter_gather_thread_exit) that would be invoked without seeing an actual scatter-gather instruction and would fail if !proc_avx_enabled.

but that code in question looks like it will run just fine in release build

Yes it does look like that.