davidlattimore / wild

Apache License 2.0
660 stars 16 forks source link

Incorrect string-merge results when code is compiled with clang #180

Open davidlattimore opened 1 month ago

davidlattimore commented 1 month ago

When program_name_29___cpp_integration_cc__ is compiled with clang rather than gcc, it fails. Apparently all the merged strings end up as the string ELF. See https://github.com/davidlattimore/wild/pull/140. We should probably first add an option to allow overriding the compiler - i.e. #179.

marxin commented 4 weeks ago

I cannot see the wrong output, what happens to be is /home/marxin/Programming/wild/wild/tests/build/cpp-integration.cc-model-large.wild crashes with:

❯ valgrind wild/tests/build/cpp-integration.cc-model-large.wild
==42077== Memcheck, a memory error detector
==42077== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==42077== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==42077== Command: /home/marxin/Programming/wild/wild/tests/build/cpp-integration.cc-model-large.wild
==42077== 
==42077== Invalid read of size 1
==42077==    at 0x48506A2: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==42077==    by 0x49DEF57: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator+=(char const*) (in /usr/lib64/libstdc++.so.6.0.33)
==42077==    by 0x4028F6: main (in /home/marxin/Programming/wild/wild/tests/build/cpp-integration.cc-model-large.wild)
==42077==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

Do we speak about the same issue?

davidlattimore commented 4 weeks ago

Interesting. I still get output even under valgrind:

valgrind wild/tests/build/cpp-integration.cc-model-large.wild
==13114== Memcheck, a memory error detector
==13114== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==13114== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==13114== Command: wild/tests/build/cpp-integration.cc-model-large.wild
==13114== 
ELFELFELFELFELFELFELF
==13114== 
==13114== HEAP SUMMARY:
==13114==     in use at exit: 0 bytes in 0 blocks
==13114==   total heap usage: 4 allocs, 4 frees, 73,820 bytes allocated
==13114== 
==13114== All heap blocks were freed -- no leaks are possible
==13114== 
==13114== For lists of detected and suppressed errors, rerun with: -s
==13114== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I just tried running in the OpenSUSE docker image and the test there didn't pass linker-diff validation:

wild: /wild/wild/tests/build/cpp-integration.cc-model-large.wild
ld: /wild/wild/tests/build/cpp-integration.cc-model-large.ld
asm._start
  ORIG            `/usr/lib64/gcc/x86_64-suse-linux/14/../../../../lib64/crt1.o`
                  endbr64
                  xor %ebp,%ebp
                  mov %rdx,%r9
                  pop %rsi
                  mov %rsp,%rdx
                  and $0xFFFFFFFFFFFFFFF0,%rsp
                  push %rax
                  push %rsp
                  xor %r8d,%r8d
                  xor %ecx,%ecx

  wild 0x00402778 48 c7 c7 90 28 40 00 mov $0x402890,%rdi  // Mov_rm64_imm32(0x402890) main
  ld   0x00401038 48 8b 3d 49 2f 00 00 mov 0x2F68,%rdi  // Mov_r64_rm64(0x403f88) GOT(main)
  ORIG            48 8b 3d 00 00 00 00 mov 0x1F,%rdi  // R_X86_64_REX_GOTPCRELX -> `main` -4
  TRACE           relaxation.kind=RexMovIndirectToAbsolute value_flags=ADDRESS | CAN_BYPASS_GOT resolution_flags=DIRECT

                  callq *0  // 0xAAA=DYNAMIC(__libc_start_main@GLIBC_2.34)
                  hlt

section.ltext._ZNSt11char_traitsIcE7compareEPKcS2_m.flags
  wild AXG
  ld AX

section.ltext._ZStneIcSt11char_traitsIcESaIcEEbRKNSt7__cxx1112basic_stringIT_T0_T1_EEPKS5_.flags
  wild AXG
  ld AX

section.ltext._ZSteqIcSt11char_traitsIcESaIcEEbRKNSt7__cxx1112basic_stringIT_T0_T1_EEPKS5_.flags
  wild AXG
  ld AX

section.ltext._ZNSt11char_traitsIcE6lengthEPKc.flags
  wild AXG
  ld AX

We don't do anything special for .ltext sections and probably we should. Our support for large models is incomplete in other ways too.