YosysHQ / apicula

Project Apicula 🐝: bitstream documentation for Gowin FPGAs
MIT License
465 stars 66 forks source link

YoWASP build is broken #142

Closed pepijndevos closed 1 year ago

pepijndevos commented 1 year ago

@whitequark informed me that YoWASP build recently broke and @gatecat said it might help to run it in valgrind to see if the error reproduces.

https://github.com/YoWASP/nextpnr/commit/461a69a492c2d0285bc5e7480e01f36566ddf2dc#commitcomment-91992429

Relatedly, our own YoWASP CI has also been broken for a while, which just one day a couple of months stopped working. This however seems like a different error, possibly just incompatible nextpnr and apicula versions. https://github.com/YosysHQ/apicula/actions/workflows/yowasp_examples.yml

Anyway I've run nextpnr in valgrind and this is the result. An invalid read in some json code that was allocated and freed in the timing analyzer? Haven't looked deeper into it, just want to dump the output somewhere before I go to bed. It took quite a while to run it under valgrind...

==36158== Invalid read of size 4
==36158==    at 0x29FCBD: UnknownInlinedFun (jsonwrite.cc:91)
==36158==    by 0x29FCBD: nextpnr_gowin::JsonWriter::write_module(std::ostream&, nextpnr_gowin::Context*) (jsonwrite.cc:132)
==36158==    by 0x337C55: UnknownInlinedFun (jsonwrite.cc:210)
==36158==    by 0x337C55: UnknownInlinedFun (jsonwrite.cc:223)
==36158==    by 0x337C55: nextpnr_gowin::CommandHandler::executeMain(std::unique_ptr<nextpnr_gowin::Context, std::default_delete<nextpnr_gowin::Context> >) [clone .constprop.0] (command.cc:475)
==36158==    by 0x15ACE4: UnknownInlinedFun (command.cc:533)
==36158==    by 0x15ACE4: main (main.cc:113)
==36158==  Address 0x44caac60 is 0 bytes inside a block of size 92 free'd
==36158==    at 0x4844B5F: operator delete(void*, unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==36158==    by 0x237F2A: UnknownInlinedFun (new_allocator.h:158)
==36158==    by 0x237F2A: UnknownInlinedFun (alloc_traits.h:496)
==36158==    by 0x237F2A: UnknownInlinedFun (stl_vector.h:387)
==36158==    by 0x237F2A: UnknownInlinedFun (stl_vector.h:366)
==36158==    by 0x237F2A: UnknownInlinedFun (stl_vector.h:733)
==36158==    by 0x237F2A: UnknownInlinedFun (hashlib.h:212)
==36158==    by 0x237F2A: UnknownInlinedFun (timing.h:175)
==36158==    by 0x237F2A: UnknownInlinedFun (stl_pair.h:185)
==36158==    by 0x237F2A: UnknownInlinedFun (hashlib.h:214)
==36158==    by 0x237F2A: UnknownInlinedFun (stl_construct.h:151)
==36158==    by 0x237F2A: UnknownInlinedFun (stl_construct.h:163)
==36158==    by 0x237F2A: UnknownInlinedFun (stl_construct.h:196)
==36158==    by 0x237F2A: UnknownInlinedFun (alloc_traits.h:850)
==36158==    by 0x237F2A: UnknownInlinedFun (stl_vector.h:730)
==36158==    by 0x237F2A: UnknownInlinedFun (hashlib.h:212)
==36158==    by 0x237F2A: nextpnr_gowin::TimingAnalyser::~TimingAnalyser() (timing.h:73)
==36158==    by 0x283315: UnknownInlinedFun (router1.cc:101)
==36158==    by 0x283315: nextpnr_gowin::router1(nextpnr_gowin::Context*, nextpnr_gowin::Router1Cfg const&) (router1.cc:984)
==36158==    by 0x2C8DFC: nextpnr_gowin::Arch::route() (arch.cc:2079)
==36158==    by 0x338D2A: nextpnr_gowin::CommandHandler::executeMain(std::unique_ptr<nextpnr_gowin::Context, std::default_delete<nextpnr_gowin::Context> >) [clone .constprop.0] (command.cc:461)
==36158==    by 0x15ACE4: UnknownInlinedFun (command.cc:533)
==36158==    by 0x15ACE4: main (main.cc:113)
==36158==  Block was alloc'd at
==36158==    at 0x4842003: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==36158==    by 0x33E416: UnknownInlinedFun (new_allocator.h:137)
==36158==    by 0x33E416: UnknownInlinedFun (alloc_traits.h:464)
==36158==    by 0x33E416: UnknownInlinedFun (stl_vector.h:378)
==36158==    by 0x33E416: std::vector<int, std::allocator<int> >::_M_fill_insert(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, unsigned long, int const&) [clone .constprop.0] (vector.tcc:575)
==36158==    by 0x320E9D: UnknownInlinedFun (hashlib.h:246)
==36158==    by 0x320E9D: UnknownInlinedFun (hashlib.h:355)
==36158==    by 0x320E9D: UnknownInlinedFun (hashlib.h:512)
==36158==    by 0x320E9D: nextpnr_gowin::TimingAnalyser::copy_domains(nextpnr_gowin::CellPortKey const&, nextpnr_gowin::CellPortKey const&, bool) [clone .constprop.0] (timing.cc:767)
==36158==    by 0x22DC1A: nextpnr_gowin::TimingAnalyser::setup_port_domains() (timing.cc:238)
==36158==    by 0x22F377: nextpnr_gowin::TimingAnalyser::setup() (timing.cc:41)
==36158==    by 0x281D26: UnknownInlinedFun (router1.cc:127)
==36158==    by 0x281D26: nextpnr_gowin::router1(nextpnr_gowin::Context*, nextpnr_gowin::Router1Cfg const&) (router1.cc:871)
==36158==    by 0x2C8DFC: nextpnr_gowin::Arch::route() (arch.cc:2079)
==36158==    by 0x338D2A: nextpnr_gowin::CommandHandler::executeMain(std::unique_ptr<nextpnr_gowin::Context, std::default_delete<nextpnr_gowin::Context> >) [clone .constprop.0] (command.cc:461)
==36158==    by 0x15ACE4: UnknownInlinedFun (command.cc:533)
==36158==    by 0x15ACE4: main (main.cc:113)
==36158== 

==36158== Warning: set address range perms: large range [0x59c8e028, 0x7dc8e058) (noaccess)
Info: Program finished normally.
==36158== 
==36158== HEAP SUMMARY:
==36158==     in use at exit: 141,445 bytes in 590 blocks
==36158==   total heap usage: 31,851,461 allocs, 31,850,871 frees, 78,812,928,388 bytes allocated
==36158== 
==36158== 8 bytes in 1 blocks are definitely lost in loss record 79 of 563
==36158==    at 0x4842003: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==36158==    by 0x1E2119: UnknownInlinedFun (internals.h:505)
==36158==    by 0x1E2119: pybind11::detail::get_local_internals() (internals.h:515)
==36158==    by 0x33F37B: UnknownInlinedFun (type_caster_base.h:197)
==36158==    by 0x33F37B: pybind11::detail::get_type_info(std::type_index const&, bool) [clone .constprop.0] (type_caster_base.h:218)
==36158==    by 0x1DFC63: pybind11::cpp_function::initialize_generic(std::unique_ptr<pybind11::detail::function_record, pybind11::cpp_function::InitializingFunctionRecordDeleter>&&, char const*, std::type_info const* const*, unsigned long) (pybind11.h:436)
==36158==    by 0x328557: UnknownInlinedFun (pybind11.h:297)
==36158==    by 0x328557: UnknownInlinedFun (pybind11.h:100)
==36158==    by 0x328557: pybind11::detail::enum_base::init(bool, bool) [clone .constprop.0] (pybind11.h:1933)
==36158==    by 0x328ED7: pybind11::enum_<nextpnr_gowin::GraphicElement::type_t>::enum_<>(pybind11::handle const&, char const*) [clone .constprop.0] (pybind11.h:2153)
==36158==    by 0x202F1C: nextpnr_gowin::pybind11_init_nextpnrpy_gowin(pybind11::module_&) (pybindings.cc:101)
==36158==    by 0x20601F: UnknownInlinedFun (pybindings.cc:88)
==36158==    by 0x20601F: pybind11_init_impl_nextpnrpy_gowin (pybindings.cc:88)
==36158==    by 0x4BD0FF5: create_builtin (import.c:1012)
==36158==    by 0x4BD1CD5: _imp_create_builtin (import.c:1063)
==36158==    by 0x4B33D75: cfunction_vectorcall_O (methodobject.c:516)
==36158==    by 0x4B297AF: UnknownInlinedFun (ceval.c:5915)
==36158==    by 0x4B297AF: _PyEval_EvalFrameDefault (ceval.c:4277)
==36158== 
==36158== 152 bytes in 1 blocks are definitely lost in loss record 428 of 563
==36158==    at 0x4842003: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==36158==    by 0x201A66: pybind11::detail::generic_type::initialize(pybind11::detail::type_record const&) (pybind11.h:1304)
==36158==    by 0x328EB7: UnknownInlinedFun (pybind11.h:1536)
==36158==    by 0x328EB7: pybind11::enum_<nextpnr_gowin::GraphicElement::type_t>::enum_<>(pybind11::handle const&, char const*) [clone .constprop.0] (pybind11.h:2150)
==36158==    by 0x202F1C: nextpnr_gowin::pybind11_init_nextpnrpy_gowin(pybind11::module_&) (pybindings.cc:101)
==36158==    by 0x20601F: UnknownInlinedFun (pybindings.cc:88)
==36158==    by 0x20601F: pybind11_init_impl_nextpnrpy_gowin (pybindings.cc:88)
==36158==    by 0x4BD0FF5: create_builtin (import.c:1012)
==36158==    by 0x4BD1CD5: _imp_create_builtin (import.c:1063)
==36158==    by 0x4B33D75: cfunction_vectorcall_O (methodobject.c:516)
==36158==    by 0x4B297AF: UnknownInlinedFun (ceval.c:5915)
==36158==    by 0x4B297AF: _PyEval_EvalFrameDefault (ceval.c:4277)
==36158==    by 0x4B34958: UnknownInlinedFun (pycore_ceval.h:46)
==36158==    by 0x4B34958: UnknownInlinedFun (ceval.c:5065)
==36158==    by 0x4B34958: _PyFunction_Vectorcall (call.c:342)
==36158==    by 0x4B23B15: UnknownInlinedFun (abstract.h:114)
==36158==    by 0x4B23B15: UnknownInlinedFun (abstract.h:123)
==36158==    by 0x4B23B15: UnknownInlinedFun (ceval.c:5891)
==36158==    by 0x4B23B15: _PyEval_EvalFrameDefault (ceval.c:4213)
==36158==    by 0x4B34958: UnknownInlinedFun (pycore_ceval.h:46)
==36158==    by 0x4B34958: UnknownInlinedFun (ceval.c:5065)
==36158==    by 0x4B34958: _PyFunction_Vectorcall (call.c:342)
==36158== 
==36158== 152 bytes in 1 blocks are definitely lost in loss record 429 of 563
==36158==    at 0x4842003: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==36158==    by 0x201A66: pybind11::detail::generic_type::initialize(pybind11::detail::type_record const&) (pybind11.h:1304)
==36158==    by 0x3294E7: UnknownInlinedFun (pybind11.h:1536)
==36158==    by 0x3294E7: pybind11::enum_<nextpnr_gowin::GraphicElement::style_t>::enum_<>(pybind11::handle const&, char const*) [clone .constprop.0] (pybind11.h:2150)
==36158==    by 0x202FAD: nextpnr_gowin::pybind11_init_nextpnrpy_gowin(pybind11::module_&) (pybindings.cc:110)
==36158==    by 0x20601F: UnknownInlinedFun (pybindings.cc:88)
==36158==    by 0x20601F: pybind11_init_impl_nextpnrpy_gowin (pybindings.cc:88)
==36158==    by 0x4BD0FF5: create_builtin (import.c:1012)
==36158==    by 0x4BD1CD5: _imp_create_builtin (import.c:1063)
==36158==    by 0x4B33D75: cfunction_vectorcall_O (methodobject.c:516)
==36158==    by 0x4B297AF: UnknownInlinedFun (ceval.c:5915)
==36158==    by 0x4B297AF: _PyEval_EvalFrameDefault (ceval.c:4277)
==36158==    by 0x4B34958: UnknownInlinedFun (pycore_ceval.h:46)
==36158==    by 0x4B34958: UnknownInlinedFun (ceval.c:5065)
==36158==    by 0x4B34958: _PyFunction_Vectorcall (call.c:342)
==36158==    by 0x4B23B15: UnknownInlinedFun (abstract.h:114)
==36158==    by 0x4B23B15: UnknownInlinedFun (abstract.h:123)
==36158==    by 0x4B23B15: UnknownInlinedFun (ceval.c:5891)
==36158==    by 0x4B23B15: _PyEval_EvalFrameDefault (ceval.c:4213)
==36158==    by 0x4B34958: UnknownInlinedFun (pycore_ceval.h:46)
==36158==    by 0x4B34958: UnknownInlinedFun (ceval.c:5065)
==36158==    by 0x4B34958: _PyFunction_Vectorcall (call.c:342)
==36158== 
==36158== 152 bytes in 1 blocks are definitely lost in loss record 430 of 563
==36158==    at 0x4842003: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==36158==    by 0x201A66: pybind11::detail::generic_type::initialize(pybind11::detail::type_record const&) (pybind11.h:1304)
==36158==    by 0x329B17: UnknownInlinedFun (pybind11.h:1536)
==36158==    by 0x329B17: pybind11::enum_<nextpnr_gowin::PortType>::enum_<>(pybind11::handle const&, char const*) [clone .constprop.0] (pybind11.h:2150)
==36158==    by 0x203414: nextpnr_gowin::pybind11_init_nextpnrpy_gowin(pybind11::module_&) (pybindings.cc:129)
==36158==    by 0x20601F: UnknownInlinedFun (pybindings.cc:88)
==36158==    by 0x20601F: pybind11_init_impl_nextpnrpy_gowin (pybindings.cc:88)
==36158==    by 0x4BD0FF5: create_builtin (import.c:1012)
==36158==    by 0x4BD1CD5: _imp_create_builtin (import.c:1063)
==36158==    by 0x4B33D75: cfunction_vectorcall_O (methodobject.c:516)
==36158==    by 0x4B297AF: UnknownInlinedFun (ceval.c:5915)
==36158==    by 0x4B297AF: _PyEval_EvalFrameDefault (ceval.c:4277)
==36158==    by 0x4B34958: UnknownInlinedFun (pycore_ceval.h:46)
==36158==    by 0x4B34958: UnknownInlinedFun (ceval.c:5065)
==36158==    by 0x4B34958: _PyFunction_Vectorcall (call.c:342)
==36158==    by 0x4B23B15: UnknownInlinedFun (abstract.h:114)
==36158==    by 0x4B23B15: UnknownInlinedFun (abstract.h:123)
==36158==    by 0x4B23B15: UnknownInlinedFun (ceval.c:5891)
==36158==    by 0x4B23B15: _PyEval_EvalFrameDefault (ceval.c:4213)
==36158==    by 0x4B34958: UnknownInlinedFun (pycore_ceval.h:46)
==36158==    by 0x4B34958: UnknownInlinedFun (ceval.c:5065)
==36158==    by 0x4B34958: _PyFunction_Vectorcall (call.c:342)
==36158== 
==36158== 152 bytes in 1 blocks are definitely lost in loss record 431 of 563
==36158==    at 0x4842003: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==36158==    by 0x201A66: pybind11::detail::generic_type::initialize(pybind11::detail::type_record const&) (pybind11.h:1304)
==36158==    by 0x32A147: UnknownInlinedFun (pybind11.h:1536)
==36158==    by 0x32A147: pybind11::enum_<nextpnr_gowin::PlaceStrength>::enum_<>(pybind11::handle const&, char const*) [clone .constprop.0] (pybind11.h:2150)
==36158==    by 0x20346B: nextpnr_gowin::pybind11_init_nextpnrpy_gowin(pybind11::module_&) (pybindings.cc:135)
==36158==    by 0x20601F: UnknownInlinedFun (pybindings.cc:88)
==36158==    by 0x20601F: pybind11_init_impl_nextpnrpy_gowin (pybindings.cc:88)
==36158==    by 0x4BD0FF5: create_builtin (import.c:1012)
==36158==    by 0x4BD1CD5: _imp_create_builtin (import.c:1063)
==36158==    by 0x4B33D75: cfunction_vectorcall_O (methodobject.c:516)
==36158==    by 0x4B297AF: UnknownInlinedFun (ceval.c:5915)
==36158==    by 0x4B297AF: _PyEval_EvalFrameDefault (ceval.c:4277)
==36158==    by 0x4B34958: UnknownInlinedFun (pycore_ceval.h:46)
==36158==    by 0x4B34958: UnknownInlinedFun (ceval.c:5065)
==36158==    by 0x4B34958: _PyFunction_Vectorcall (call.c:342)
==36158==    by 0x4B23B15: UnknownInlinedFun (abstract.h:114)
==36158==    by 0x4B23B15: UnknownInlinedFun (abstract.h:123)
==36158==    by 0x4B23B15: UnknownInlinedFun (ceval.c:5891)
==36158==    by 0x4B23B15: _PyEval_EvalFrameDefault (ceval.c:4213)
==36158==    by 0x4B34958: UnknownInlinedFun (pycore_ceval.h:46)
==36158==    by 0x4B34958: UnknownInlinedFun (ceval.c:5065)
==36158==    by 0x4B34958: _PyFunction_Vectorcall (call.c:342)
==36158== 
==36158== LEAK SUMMARY:
==36158==    definitely lost: 616 bytes in 5 blocks
==36158==    indirectly lost: 0 bytes in 0 blocks
==36158==      possibly lost: 0 bytes in 0 blocks
==36158==    still reachable: 140,829 bytes in 585 blocks
==36158==         suppressed: 0 bytes in 0 blocks
==36158== Reachable blocks (those to which a pointer was found) are not shown.
==36158== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==36158== 
==36158== For lists of detected and suppressed errors, rerun with: -s
pepijndevos commented 1 year ago

So this is where the invalid access apparently happens: https://github.com/YosysHQ/nextpnr/blob/78926b31db6dbfb465e2585ce0050bf93cb3e35c/json/jsonwrite.cc#L91 which was allocated here: https://github.com/YosysHQ/nextpnr/blob/78926b31db6dbfb465e2585ce0050bf93cb3e35c/common/kernel/timing.cc#L767 Does that make sense?

Also not sure it's actually related, since the bug wq mentioned seems to refer to the chipdb version change, which this doesn't seem to be immediately related to.

pepijndevos commented 1 year ago

Huh upstream build works but our local build still fails with Exception: Unknown bel:R6C20_RAMW

Seems like there is some mismatch in the bells as understood by nextpnr and apicula. Maybe some off-by-one error or something like #141?

yrabbit commented 1 year ago

Nah, it looks like some ancient version of aicula is being used for the tests, which knows nothing about these Bels.

https://github.com/YosysHQ/apicula/actions/runs/3773584601/jobs/6435571904#step:5:27

I have no idea where it says to use this version yet

pepijndevos commented 1 year ago

@whitequark it looks like the latest build of yowasp-nextpnr-gowin on pypi uses a really old Apycula==0.4 release, while yowasp_nextpnr_gowin-0.4.post79.dev377 from CI does contain 0.6.1. AIUI Pypi releases are a manual process.

So I guess question 1 is, could you make a release? And question 2 is, should our CI be using CI artifacts or sandbox releases to test the latest changes?

whitequark commented 1 year ago

So I guess question 1 is, could you make a release?

In general I do these in sync with nextpnr release. I've pushed one just for you now, though.

And question 2 is, should our CI be using CI artifacts or sandbox releases to test the latest changes?

I'm not sure I understand the two options proposed, can you elaborate?

pepijndevos commented 1 year ago

If I understand your setup correctly, yowasp CI builds wheels and uploads then to testing pypi so I'm wondering if we should run our yowasp ci directly on your ci artifacts or on testing pypi rather than the manually released production pypi

whitequark commented 1 year ago

Oh yeah, that seems quite reasonable.