lepalom commented 4 years ago

We are trying to incorporate 0.6.1 to Debian and we have found several build problems in some platforms. You can check it here: https://buildd.debian.org/status/package.php?p=fcl&suite=experimental

For instance,

In arm64: 92% tests passed, 3 tests failed out of 37 The following tests FAILED: 9 - test_fcl_capsule_capsule (Failed) 32 - test_gjk_libccd-inl_epa (Failed) 34 - test_gjk_libccd-inl_gjk_doSimplex2 (Failed)
In i386: 81% tests passed, 7 tests failed out of 37 The following tests FAILED: 9 - test_fcl_capsule_capsule (Failed) 21 - test_fcl_signed_distance (Failed) 30 - test_convex (Failed) 31 - test_capsule (Failed) 32 - test_gjk_libccd-inl_epa (Failed) 35 - test_sphere_box (Failed) 36 - test_sphere_cylinder (Failed)
In mips64: Does not make test.
In mipsel, Does not build because memory problems.
In ppc64el, 92% tests passed, 3 tests failed out of 37 The following tests FAILED: 9 - test_fcl_capsule_capsule (Failed) 32 - test_gjk_libccd-inl_epa (Failed)
In s390x, 95% tests passed, 2 tests failed out of 37 The following tests FAILED: 9 - test_fcl_capsule_capsule (Failed) 32 - test_gjk_libccd-inl_epa (Failed)

Please, could you take one eye in the build logs and help us to make run fcl in that platforms?

sherm1 commented 4 years ago

Random observation: lots of warnings like this in the logs:

/include/fcl/geometry/bvh/BVH_model-inl.h:141:11: 
warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type 
‘struct fcl::BVNode<fcl::RSS<double> >’ with no trivial copy-assignment; 
use copy-assignment or copy-initialization instead [-Wclass-memaccess]
  141 |     memcpy(bvs, other.bvs, sizeof(BVNode<BV>) * num_bvs);

Not clear whether those might be causing the test failures.

simonschmeisser commented 3 years ago

Two months till the feature freeze for the next Debian version, would be nice to get 0.6 in!

sherm1 commented 3 years ago

@jamiesnape @SeanCurtis-TRI any thoughts on diagnosing these problems?

SeanCurtis-TRI commented 3 years ago

Sorry for the long delay in responding to this. You can consider that delay a reflection of the current level of maintainer's resource availability. In principle, we like the idea of broad platform support, in practice that can be problematic. As such, we propose the following, and invite further insights and arguments from the community.

Currently supported

amd64 (currently under test in CI)

Probably will explicitly support

arm64

Probably won't explicitly support -- available on Travis-CI but not worth the CI "cost"

ppc64el
s390x

Almost definitely will not provide formal support

i386
mips64
mipsel

For arm64, we're willing to extend the CI to cover that architecture (and deal with whatever failures arise). We're loathe to claim support for any platform that we don't have legitimate CI support for. We're unsure about the utility of the power pc platforms (even though TravisCI does support them) and are reluctant to force every PR to run through them without some good reasons for explicit support.

This is our first pass on "resolving" this issue and, again, we're open to other views. But the message that I really hope to make clear is that we're going to have to rely on the community's help (which has been great as recent build PRs attest) if we want to extend support across more platforms.

jamiesnape commented 3 years ago

There was historically some utility in running i386 builds to flush out a few obscure bugs that assume 64-bits, but now ARM is mostly at 64-bits, maybe the assumption is fine regardless. Looks (unsurprisingly) like a lot of tolerance issues, but anything mentioning memcpy raises red flags for me with multi-architecture support.

lepalom commented 3 years ago

FYI. @roehling did a good job patching fcl to be build in several platforms. Check:

https://buildd.debian.org/status/package.php?p=fcl&suite=experimental

jamiesnape commented 3 years ago

So these three patches? https://sources.debian.org/src/fcl/0.6.1-3/debian/patches/ (the last one being the most relevant)

lepalom commented 3 years ago

Yes. If you want, please use them.

jamiesnape commented 3 years ago

They all look good to me. I can PR them if @SeanCurtis-TRI and/or @sherm1 like the idea.

SeanCurtis-TRI commented 3 years ago

Seems like a definite improvement. I'm in favor of it.

jamiesnape commented 3 years ago

Seems like it should be easy enough, so I will put up a PR shortly.

jamiesnape commented 3 years ago

https://github.com/flexible-collision-library/fcl/pull/517

roehling commented 3 years ago

I'm glad you find my work useful. Still, I want to point out that those patches do not really eliminate the test failures. I have to confess I "fixed" the failing builds by ignoring the failing test results on non-amd64 architectures…

I believe that the test failures are largely due to differences in hardware floating point formats which lead to unexpected loss of precision. You can eliminate all test failures on i386 with -mfpmath=sse -msse2, so the infamous x87 FPU does not mess up intermediate results with gratuitous conversions between double and extended precision. The s390x architecture has a non-IEEE floating point register format as well.

The most interesting test failures are those of arm64, both because of your stated intention to support this architecture, and because it seems to be the common denominator in all test failures, so if you fix the problem there, there is a good chance that the improved numerical stability will also fix the other problematic architectures:

The following tests FAILED:
      9 - test_fcl_capsule_capsule (Failed)
     32 - test_gjk_libccd-inl_epa (Failed)
     34 - test_gjk_libccd-inl_gjk_doSimplex2 (Failed)

For reference, I'll post the outputs of the failing tests here. Maybe someone with a better grasp of the codebase has a good idea on how to tackle those:

[ RUN      ] CapsuleCapsuleSegmentTest/0.OverlappingCenterLines
./test/test_fcl_capsule_capsule.cpp:755: Failure
Value of: this->RunTestConfiguration( CapsuleCapsuleSegmentTest<TypeParam>::kMultipleOverlap)
  Actual: false (Values at (0, 0) exceed tolerance: 11.862063780292655 vs. 12.178490743795633, diff = 0.31642696350297861, tolerance = 2.0097183471152322e-14
m1 =
 11.862063780292655
 13.293042338207796
-2.8160494855694154
m2 =
 12.178490743795633
 13.201144346700222
-1.6102598123986924
delta=
-0.31642696350297861
0.091897991507574162
 -1.2057896731707229)
Expected: true
[  FAILED  ] CapsuleCapsuleSegmentTest/0.OverlappingCenterLines, where TypeParam = double (0 ms)

[ RUN      ] FCL_GJK_EPA.ComputeVisiblePatchColinearNewVertex
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_epa.cpp:731: Failure
Value of: std::unordered_set<ccd_pt_edge_t*>( {&(tet.e(1)), &(tet.e(4)), &(tet.e(5))})
  Actual: { 0xaaaaedb29090, 0xaaaaedb29110, 0xaaaaedb28090 }
Expected: border_edges
Which is: { 0xaaaaedb28110, 0xaaaaedb29110, 0xaaaaedb27320 }
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_epa.cpp:733: Failure
Value of: 3u
  Actual: 3
Expected: visible_faces.size()
Which is: 1
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_epa.cpp:734: Failure
Value of: std::unordered_set<ccd_pt_face_t*>( {&(tet.f(0)), &(tet.f(1)), &(tet.f(3))})
  Actual: { 0xaaaaedb27c10, 0xaaaaedb27560, 0xaaaaedb27500 }
Expected: visible_faces
Which is: { 0xaaaaedb27c10 }
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_epa.cpp:736: Failure
Value of: 3u
  Actual: 3
Expected: internal_edges.size()
Which is: 0
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_epa.cpp:737: Failure
Value of: std::unordered_set<ccd_pt_edge_t*>( {&(tet.e(0)), &(tet.e(2)), &(tet.e(3))})
  Actual: { 0xaaaaedb27320, 0xaaaaedb28110, 0xaaaaedb27c70 }
Expected: internal_edges
Which is: {}
[  FAILED  ] FCL_GJK_EPA.ComputeVisiblePatchColinearNewVertex (1 ms)

[ RUN      ] DoSimplex2Test.OriginInSimplex
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_gjk_doSimplex2.cpp:72: Failure
Value of: 0
Expected: norm_OB_.dot(phat_OB_)
Which is: 7.5711e-18
[  FAILED  ] DoSimplex2Test.OriginInSimplex (0 ms)
[ RUN      ] DoSimplex2Test.NeedMoreComputing
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_gjk_doSimplex2.cpp:72: Failure
Value of: 0
Expected: norm_OB_.dot(phat_OB_)
Which is: 7.5711e-18
[  FAILED  ] DoSimplex2Test.NeedMoreComputing (0 ms)
[ RUN      ] DoSimplex2Test.Region1Boundary1
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_gjk_doSimplex2.cpp:72: Failure
Value of: 0
Expected: norm_OB_.dot(phat_OB_)
Which is: 7.5711e-18
[  FAILED  ] DoSimplex2Test.Region1Boundary1 (5 ms)
[ RUN      ] DoSimplex2Test.Region1Boundary2
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_gjk_doSimplex2.cpp:72: Failure
Value of: 0
Expected: norm_OB_.dot(phat_OB_)
Which is: 7.5711e-18
[  FAILED  ] DoSimplex2Test.Region1Boundary2 (5 ms)
[ RUN      ] DoSimplex2Test.Region1Boundary3
./test/narrowphase/detail/convexity_based_algorithm/test_gjk_libccd-inl_gjk_doSimplex2.cpp:72: Failure
Value of: 0
Expected: norm_OB_.dot(phat_OB_)
Which is: 7.5711e-18
[  FAILED  ] DoSimplex2Test.Region1Boundary3 (4 ms)

SeanCurtis-TRI commented 3 years ago

@roehling Thanks for the elaboration. While it doesn't get us to the targeted support, at the very least, it's a monotonic improvement of the code. :) So, I'll take the little victory today even if we still haven't won the battle.

jamiesnape commented 3 years ago

We can automatically pass -mfpmath=sse -msse2 on i386 only, that is pretty trivial, and won't affect anyone else, but it won't be exercised by CI here, so if anything else i386 breaks it is not going to get caught.

sancelot commented 3 years ago

Regarding my issues trying to compiling for MSYS2/Mingw64 with gcc 10.2 (https://github.com/flexible-collision-library/fcl/issues/516) , this patch solved many problems . But some dll imports/exports definition problems are remaining

mahiuchun commented 3 years ago

The test result on my Apple M1 box:

95% tests passed, 2 tests failed out of 41 The following tests FAILED: 21 - test_fcl_signed_distance (Failed) 38 - test_sphere_box (Failed)

We can make the tests pass with a slight increase of tolerances.

jamiesnape commented 3 years ago

@mahiuchun We would welcome a PR with the exact tolerance changes.

mahiuchun commented 3 years ago

flexible-collision-library / fcl

Build problems for several processor architectures #474

Currently supported

Probably will explicitly support

Probably won't explicitly support -- available on Travis-CI but not worth the CI "cost"

Almost definitely will not provide formal support

522