YosysHQ / nextpnr

nextpnr portable FPGA place and route tool
ISC License
1.31k stars 243 forks source link

nextpnr-ecp5 crashes packing SERDES I/O when running on macOS #1137

Closed tpwrules closed 1 year ago

tpwrules commented 1 year ago

The design is generated from a simple LiteX demo generated using the latest-ish commits. I have attached the generated design gateware for convenience. Running the build script should be sufficient to reproduce the crash. This crash doesn't seem to happen on Linux x86_64 using GCC 12.2.0 but happens on macOS using Clang 11.1.0.

I bisected the first bad commit to 86699b42f619960bfefd4d0b479dd44a90527ea4. Before that commit nextpnr-ecp5 works fine on macOS and does not crash and the produced design functions on the board. Other designs which don't use the DCUs or EXTREF blocks work on all versions.

I verified that this crash also occurs on the oss-cad-suite 2023-04-09 prepackaged binaries too. The log using them is below. I can provide Nix expressions on request if helpful. I am not 100% sure if this is a macOS vs Linux issue or a Clang vs GCC issue. Linux x86_64 Clang 11.1.0 seemed to work fine but I don't know what other environment or stdlib differences there might be.

$ PATH="/path/to/oss-cad-suite/bin:$PATH" sh build_lattice_versa_ecp5.sh
[...]
3.49. Executing CHECK pass (checking for obvious problems).
Checking module lattice_versa_ecp5...
Found and reported 0 problems.

4. Executing JSON backend.

End of script. Logfile hash: ff8a2584e8, CPU: user 1.96s system 0.06s
Yosys 0.27+30 (git sha1 101075611, aarch64-apple-darwin20.2-clang 10.0.0-4ubuntu1 -fPIC -Os)
Time spent: 18% 1x abc (0 sec), 15% 7x techmap (0 sec), ...
Info: constraining clock net 'clk100' to 100.00 MHz
Info: constraining clock net 'main_serdesecp5_txoutclk' to 125.00 MHz
Info: constraining clock net 'main_serdesecp5_rxoutclk' to 125.00 MHz
Info: constraining clock net 'clk100' to 100.00 MHz

Info: Logic utilisation before packing:
Info:     Total LUT4s:      1686/43848     3%
Info:         logic LUTs:   1228/43848     2%
Info:         carry LUTs:    458/43848     1%
Info:           RAM LUTs:      0/ 5481     0%
Info:          RAMW LUTs:      0/10962     0%

Info:      Total DFFs:      1038/43848     2%

Info: Packing IOs..
Info: pin 'user_led7$tr_io' constrained to Bel 'X90/Y11/PIOD'.
Info: pin 'user_led6$tr_io' constrained to Bel 'X90/Y14/PIOB'.
Info: pin 'user_led5$tr_io' constrained to Bel 'X90/Y14/PIOD'.
Info: pin 'user_led4$tr_io' constrained to Bel 'X90/Y17/PIOA'.
Info: pin 'user_led2$tr_io' constrained to Bel 'X90/Y14/PIOA'.
Info: pin 'user_led1$tr_io' constrained to Bel 'X90/Y11/PIOB'.
Info: pin 'user_led0$tr_io' constrained to Bel 'X90/Y11/PIOC'.
Info: pin 'serial_tx$tr_io' constrained to Bel 'X36/Y0/PIOB'.
Info: pin 'serial_rx$tr_io' constrained to Bel 'X38/Y0/PIOB'.
Info: pin 'rst_n$tr_io' constrained to Bel 'X4/Y71/PIOB'.
Info: pin 'refclk_rst_n$tr_io' constrained to Bel 'X4/Y71/PIOA'.
Info: pin 'refclk_en$tr_io' constrained to Bel 'X44/Y0/PIOB'.
Info: refclk1_p feeds EXTREFB EXTREFB.REFCLKP, removing $nextpnr_ibuf refclk1_p.
Info: refclk1_n feeds EXTREFB EXTREFB.REFCLKN, removing $nextpnr_ibuf refclk1_n.
Info: pcie_tx_p feeds DCUA DCUA.CH0_HDOUTP, removing $nextpnr_obuf pcie_tx_p.
libc++abi: terminating due to uncaught exception of type std::out_of_range: vector
build_lattice_versa_ecp5.sh: line 4: 42814 Abort trap: 6           nextpnr-ecp5 --json lattice_versa_ecp5.json --lpf lattice_versa_ecp5.lpf --textcfg lattice_versa_ecp5.config --um5g-45k --package CABGA381 --speed 8 --timing-allow-fail --seed 1

bug.tar.gz

gatecat commented 1 year ago

Thanks for reporting!

I can't actually reproduce a crash, as this is going to be quite dependent on memory layouts, etc, but playing around with valgrind hunted it down to https://github.com/YosysHQ/nextpnr/pull/1139, after which I see no invalid accesses.

Can you confirm that actually fixes the crash for you as well?

tpwrules commented 1 year ago

It crashes on master and applying that PR indeed fixes the crash. The design appears to work as well.

Thanks for the quick fix. I will let you close this issue when the PR is merged and/or released.