MRPT / mrpt

:zap: The Mobile Robot Programming Toolkit (MRPT)
https://docs.mrpt.org/reference/latest/
BSD 3-Clause "New" or "Revised" License
1.89k stars 627 forks source link

Test cases OpenGL.CFBORender_camera_intrinsics/fov failed on riscv64 #1287

Closed yuzibo closed 9 months ago

yuzibo commented 9 months ago

Hi,

I noticed the mrpt was fail to build on riscv64 due to:

make[5]: Leaving directory '/<<PKGBUILDDIR>>/obj-riscv64-linux-gnu'
rgb_diff=3294.99
depth_diff=3718.28
./libs/opengl/src/CFBORender_unittest.cpp:247: Failure
Expected: (depth_diff) < (3000.0f), actual: 3718.28076 vs 3000

[  FAILED  ] OpenGL.CFBORender_camera_intrinsics (2662 ms)
[ RUN      ] OpenGL.CFBORender_camera_fov
[ 90%] Built target run_tests_mrpt_obs
make  -f tests/CMakeFiles/run_tests_mrpt_vision.dir/build.make tests/CMakeFiles/run_tests_mrpt_vision.dir/depend
make[5]: Entering directory '/<<PKGBUILDDIR>>/obj-riscv64-linux-gnu'
cd /<<PKGBUILDDIR>>/obj-riscv64-linux-gnu && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /<<PKGBUILDDIR>> /<<PKGBUILDDIR>>/tests /<<PK
GBUILDDIR>>/obj-riscv64-linux-gnu /<<PKGBUILDDIR>>/obj-riscv64-linux-gnu/tests /<<PKGBUILDDIR>>/obj-riscv64-linux-gnu/tests/CMakeFiles/run_
tests_mrpt_vision.dir/DependInfo.cmake "--color="
make[5]: Leaving directory '/<<PKGBUILDDIR>>/obj-riscv64-linux-gnu'

make[5]: Entering directory '/<<PKGBUILDDIR>>/obj-riscv64-linux-gnu'
cd /<<PKGBUILDDIR>>/obj-riscv64-linux-gnu && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /<<PKGBUILDDIR>> /<<PKGBUILDDIR>>/libs/apps /
<<PKGBUILDDIR>>/obj-riscv64-linux-gnu /<<PKGBUILDDIR>>/obj-riscv64-linux-gnu/libs/apps /<<PKGBUILDDIR>>/obj-riscv64-linux-gnu/libs/apps/CMa
keFiles/apps.dir/DependInfo.cmake "--color="
make[5]: Leaving directory '/<<PKGBUILDDIR>>/obj-riscv64-linux-gnu'
make  -f libs/apps/CMakeFiles/apps.dir/build.make libs/apps/CMakeFiles/apps.dir/build
make[5]: Entering directory '/<<PKGBUILDDIR>>/obj-riscv64-linux-gnu'
make[5]: Nothing to be done for 'libs/apps/CMakeFiles/apps.dir/build'.
rgb_diff=1730.11
depth_diff=6079.06
./libs/opengl/src/CFBORender_unittest.cpp:247: Failure
Expected: (depth_diff) < (3000.0f), actual: 6079.05566 vs 3000

The full log is here: https://buildd.debian.org/status/fetch.php?pkg=mrpt&arch=riscv64&ver=1%3A2.10.0%2Bds-3&stamp=1694429578&raw=0

It seems the depth value is unexpected on riscv64, but I am not sure why it happened. Should I dig into deeply? TIA

jlblancoc commented 9 months ago

Thanks for reporting, @yuzibo ! I missed that one, but there are other Debian targeted changes on the way towards a new Debian release asap, so it's good to fix this one upstream now too.

Those particular unit tests are a bit "risky"... they render a 3D scene using whatever hardware GPU or software-emulated version of OpenGL is present in each architecture, and compare it, numerically pixel by pixel, with a reference one generated on a desktop PC with GPU. So much uncertainty in the middle!!

Without a copy of the resulting images we can't tell if the error is catastrophic or just a matter of differences in the OpenGL emulation.

But with some numbers: there are 200e3 pixels, so an error of 3000 means an average error of 0.015 (over 1.0, not 255), so it's an average 1.5% error.... The error you copied:

rgb_diff=1730.11 depth_diff=6079.06 ./libs/opengl/src/CFBORender_unittest.cpp:247: Failure Expected: (depth_diff) < (3000.0f), actual: 6079.05566 vs 3000

could then be "fixed" by just increasing the tolerance of the test.

However... looking at the latest failed log for riscv64 (the one you actually link to), the error is different:

[ RUN      ] OpenGL.CFBORender_camera_fov
make  -f tests/CMakeFiles/run_tests_mrpt_topography.dir/build.make tests/CMakeFiles/run_tests_mrpt_topography.dir/depend
Dependencies file "tests/CMakeFiles/test_mrpt_slam.dir/__/libs/slam/src/slam/CIncrementalMapPartitioner_unittest.cpp.o.d" is newer than depends file "/<<PKGBUILDDIR>>/obj-riscv64-linux-gnu/tests/CMakeFiles/test_mrpt_slam.dir/compiler_depend.internal".
Dependencies file "tests/CMakeFiles/test_mrpt_slam.dir/__/libs/slam/src/slam/CMonteCarloLocalization2D_unittest.cpp.o.d" is newer than depends file "/<<PKGBUILDDIR>>/obj-riscv64-linux-gnu/tests/CMakeFiles/test_mrpt_slam.dir/compiler_depend.internal".
unknown file: Failure
C++ exception with description "==== MRPT exception ====
Message:  Assert condition failed: m_eglContext != EGL_NO_CONTEXT
Location: ./libs/opengl/src/CFBORender.cpp:177: [mrpt::opengl::CFBORender::CFBORender(const Parameters&)
Call stack backtrace:
[0 ] (unknown file) mrpt::opengl::CFBORender::CFBORender(mrpt::opengl::CFBORender::Parameters const&)
[1 ]       0x2ab26bf540 

which seems to imply that the system is not capable of rendering off-screen (... or a bug in mrpt for whatever combination of constraints in that arch).

I think the easiest workaround is to add riscv to the blacklisted archs for these particular tests here:

https://github.com/MRPT/mrpt/blob/0165ac55785f3643b63a087416205c57ca733cec/libs/opengl/src/CFBORender_unittest.cpp#L30-L37

Do you agree?

yuzibo commented 9 months ago

Thanks for the quick reply!

yes. The log I pasted above:

rgb_diff=1730.11 depth_diff=6079.06 ./libs/opengl/src/CFBORender_unittest.cpp:247: Failure Expected: (depth_diff) < (3000.0f), actual: 6079.05566 vs 3000

from my local Unmatched with a AMD Radeon HD 5000/6000/7350/8350 Series graphic card.

But the error from the official Debian buildd log generated from a Unmatched without any graphic card AFAIK.

Sorry for the confusion.

More serious issues are even on my local Unmatched, the egl instance was still fail to create:

...
EGLContext context = eglCreateContext(display, config, EGL_NO_CONTEXT, NULL);
    if (context == EGL_NO_CONTEXT) {
        std::cerr << "Failed to create EGL context" << std::endl;
        EGLint error = eglGetError(); // 获取EGL错误代码
        std::cerr << "Failed to create EGL context. Error code: " << error << std::endl;
        return -1;
    }
...

The output:

Failed to create EGL context. Error code: 12291

It seems this is EGL_BAD_ALLOC issue from here

I think the easiest workaround is to add riscv to the blacklisted archs for these particular tests here:

This is my intent to report it here. If upstream agrees to do so, the Debian package maintainer should agree to skip the test on riscv64 also.:)

Could you tell me what the possible causes of this problem(EGL_BAD_ALLOC)? I hope to report it to other upstream also. (BTW, I heard there is not GPU support on riscv64, especially from imaginations)

Thanks.

jlblancoc commented 9 months ago

Hi @yuzibo !

In theory, I think e0f4ccdf695e788c2539027d5bce18d5239d7b18 "solves" the problem, once it's released as the next stable release.

However, I'm dissatisfied with not being able to find a reason for your EGL_BAD_ALLOC ... it might seem the render buffer is too large, but it shouldn't be the case for any modern graphic card, so...