genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.07k stars 252 forks source link

tool chain update for 21.05 release #4094

Closed cproc closed 3 years ago

cproc commented 3 years ago

Planned features:

cproc commented 3 years ago

An initial version of updated tools is available on my 'issue4094' branch [1]. It can be used to help fix the errors which occurred with the updates. I'm going to create individual github issues for those. The gdb update is currently in progress. The plan for the ARM hard-float and -O3 features is to try to enable them as soon as all version update related errors have been fixed.

[1] https://github.com/cproc/genode/tree/issue4094

cproc commented 3 years ago

I rebased my 'issue4094' branch to staging, so it should be easier to debug the remaining problems.

nfeske commented 3 years ago

Most fixes entered the master branch now. @cproc, could you rebase your branch again to current staging?

Can you give an advice for the appropriate time when we should collectively switch to the new tool chain, or if you see showstoppers? I think it would be good to take this step this week so that we have enough time to settle the remaining issues (e.g., uncovered by our nightly test) until the 21.05 release.

cproc commented 3 years ago

I rebased my 'issue4094' branch to staging and added commit messages. In principle, the commits should be ready for staging now. The temporary binutils commit is currently needed to build Fiasco.OC for ARM. The remaining known build issues should be relatively easy to work around with '-Wno-error=' options if needed until a real fix is available. The gdb update is currently in progress.

cproc commented 3 years ago

I rebased my 'issue4094' branch to staging and removed the temporary binutils commit, which is not needed anymore with the Fiasco.OC ARM link fix in issue #4118.

chelmuth commented 3 years ago

Please correct me, but with current staging (be5a2f65c619f9c04a87b06ae57e60336ca9c11f) there are only two open issues - #4126 and #4142 - blocking the switch to GCC 10. Let's squash these today, so we can switch the nightly to GCC 10 on Monday and also update our Sculpt installations soon.

cproc commented 3 years ago

The Fiasco.OC ARM link commit from issue #4118 seems to be still missing on staging (or I don't see it for some reason) and from Issue #4126 the core.ld commit (or a better solution if anybody knows one) is also needed. And of course the ada-runtime commit needs to be unreverted when switching.

m-stein commented 3 years ago

@chelmuth I'm working at #4142.

ssumpf commented 3 years ago

I have added a fix for #4126 for core.ld that is a little less inversive (https://github.com/genodelabs/genode/commit/76bc0c0b7fd9f73cfa14f968d1c421b1ebfd7cc3)

chelmuth commented 3 years ago

@cproc could you please include 76bc0c0b7fd9f73cfa14f968d1c421b1ebfd7cc3 in your gcc10 branch, so it gets merged to staging finally? I suggest to change the commit message title to binutils: augment equally-named sections again.

cproc commented 3 years ago

@chelmuth: done

chelmuth commented 3 years ago

Preliminary results from the depot package builds:

chelmuth commented 3 years ago

The depot autopilot failed on arm_v7a for test-libc_connect_vfs_server_lxip, which (as we discovered after some hours of digging) is caused by an unaligned access in lxip.lib.so. Fix is still pending.

chelmuth commented 3 years ago

The autopilot revealed the following anomalies after the tool chain update.

chelmuth commented 3 years ago

Commit 6e9bea0e59a3eaa0cb5989dd59ea5d9237a98d4f fixes the lxip alignment issue. I'm not sure about the ADDITIONAL_HEADROOM though https://github.com/genodelabs/genode/blob/6e9bea0e59a3eaa0cb5989dd59ea5d9237a98d4f/repos/dde_linux/src/lib/lxip/driver.c#L118-L122

nfeske commented 3 years ago

On foc_x86_64, the switch to GCC-10 triggered the following problem in the thread.run test:

[2021-05-14 06:15:23] [init -> test-thread] running 'test_pause_resume'
[2021-05-14 06:15:23] [init -> test-thread] --- pausing ---
[2021-05-14 06:15:23] [init -> test-thread] --- paused ---
[2021-05-14 06:15:23] [init -> test-thread] --- reading thread state ---
[2021-05-14 06:15:23] [init -> test-thread] --- resuming thread ---
[2021-05-14 06:15:23] Warning: failed to acknowledge exception, l4_ipc_err=2
[2021-05-14 06:15:23] Warning: failed to acknowledge exception, l4_ipc_err=2
[2021-05-14 06:15:23] Warning: failed to acknowledge exception, l4_ipc_err=2
[2021-05-14 06:15:23] [init -> test-thread] --- thread resumed ---
[2021-05-14 06:15:23] Warning: failed to acknowledge exception, l4_ipc_err=2
[2021-05-14 06:15:23] Warning: failed to acknowledge exception, l4_ipc_err=2
[2021-05-14 06:15:23] Warning: failed to acknowledge exception, l4_ipc_err=2
[2021-05-14 06:15:23] Warning: failed to acknowledge exception, l4_ipc_err=2
[2021-05-14 06:15:23] Warning: failed to acknowledge exception, l4_ipc_err=2

The messages continue infinitely.

@cproc May you give it a look?

nfeske commented 3 years ago

The problem can be reproduced on Qemu.

chelmuth commented 3 years ago

I still wonder, why foc is nagging about "KERNEL0: alignment error at 18003061 (PC: 0102ec44, SP: 408ff480, FSR: 90000001, PSR: 60000110)", because I made sure CONFIG_ARM_ALIGNMENT_CHECK is switched of in the kernel config.

cproc commented 3 years ago

Commit 02aa4ef fixes the thread.run test on foc.

cproc commented 3 years ago

I saw the foc alignment error with usb_hid_raw.run on arm_v6/rpi as well. According to git bisect it is somehow related to commit dc89ebf, but maybe this commit just triggered an existing bug. According to [1] some instructions always trigger alignment faults, but the instruction at the reported error PC did not look like one of those at first sight:

[init -> usb_drv] usb_parse_interface(): &alt->desc: 180030a4, d: 18003061, USB_DT_INTERFACE_SIZE: 9
KERNEL0: alignment error at 18003065 (PC: 01042350, SP: 408ff648, FSR: 90000001, PSR: 60000110)
/.../contrib/dde_linux-6f2e873679a79e8c85bdf62c2f81910b357f8abe/src/drivers/usb_host/drivers/usb/core/config.c:482
        memcpy(&alt->desc, d, USB_DT_INTERFACE_SIZE);
 1042350:       e5943004        ldr     r3, [r4, #4]

When I reverted the mapping commit, the memcpy() addresses did not change, but the alignment error did not occur anymore.

[1] https://developer.arm.com/documentation/ddi0406/b/Application-Level-Architecture/Application-Level-Memory-Model/Alignment-support/Unaligned-data-access

chelmuth commented 3 years ago

This pretty much looks like the same place I discovered in usb_host_drv on imx6q_sabrelite: A standard ldr instruction apparently leads to an alignment error of foc with DFSR=90000001. Does this value really make sense as bits [31:13] are UNP or SBZ? It seems 18003065 is a DMA buffer, which may be important if page-table attributes could cause this error.

chelmuth commented 3 years ago

Could it be that we see the behavior described in https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Application-Level-Memory-Model/Memory-types-and-attributes-and-the-memory-order-model/Memory-access-restrictions?lang=en?

The following restrictions apply to memory accesses: [...] Unaligned data access identifies the instructions that can make an unaligned memory access, and the required configuration setting. If such an access is to Device or Strongly-ordered memory then: [...]

  • if the implementation includes the Large Physical Address Extension, the access generates an Alignment fault.
alex-ab commented 3 years ago

I saw the foc alignment error with usb_hid_raw.run on arm_v6/rpi as well. According to git bisect it is somehow related to commit dc89ebf, but maybe this commit just triggered an existing bug.

Is the former case of the original commit

case UNCACHED:
        if (!_reply_mapping.iomem())
            l4_utcb_mr()->mr[0] |= L4_FPAGE_BUFFERABLE << 4;

still covered in the new commit ? Looking at the code I could not see it.

chelmuth commented 3 years ago

Well spotted! I'm going to fix and test the !_reply_mapping.io_mem case after the autopilot has finished.

cproc commented 3 years ago

Commit 9a8138e updates gdb to version 10.2

chelmuth commented 3 years ago

The builder reveals errors with gdb 10.2. Could you please look into this as I'd expect more follow up errors or issues on platforms beside x86_64?


      [gdb_arm]  /usr/local/genode/tool/21.05/bin/../lib/gcc/x86_64-pc-elf/10.3.0/../../../../x86_64-pc-elf/bin/ld: main.o: in function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&)':
      [gdb_arm]  /data/genode/contrib/stdcxx-d2865c41fafbbf66051d38e7b742c4d5bc2f05a3/include/stdcxx/bits/basic_string.h:6123: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::capacity() const'
      [gdb_arm]  /usr/local/genode/tool/21.05/bin/../lib/gcc/x86_64-pc-elf/10.3.0/../../../../x86_64-pc-elf/bin/ld: /data/genode/contrib/stdcxx-d2865c41fafbbf66051d38e7b742c4d5bc2f05a3/include/stdcxx/bits/basic_string.h:6123: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::capacity() const'
      [gdb_arm]  /usr/local/genode/tool/21.05/bin/../lib/gcc/x86_64-pc-elf/10.3.0/../../../../x86_64-pc-elf/bin/ld: xml-syscall.o: in function `init_syscalls_info(gdbarch*)':
      [gdb_arm]  /data/genode/contrib/gdb-601cdd5711839f85cd2d151f51d989b678a02efc/src/noux-pkg/gdb/gdb/xml-syscall.c:376: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
      [gdb_arm]  collect2: error: ld returned 1 exit status
ssumpf commented 3 years ago

@chelmuth: 1745c6c enables building parts of the tool chain (needed for RISC-V), d899e98 fixes ldso for RISC-V

chelmuth commented 3 years ago

May we document that ARM hardfloat and -O3 are deferred and close this issue?

cproc commented 3 years ago

@chelmuth: sure. For reference, armhf was discussed in issue #3466.