Open dkogan opened 1 year ago
I need to spin up an armhf box first, thanks for reporting! Also, do you also build the Python wrapper? Currently I think they need to be built together.
Hi. Thanks for replying
I need to spin up an armhf box first, thanks for reporting!
I see the same behavior on i686 and armhf, and since you probably have an amd64 box, you can more easily reproduce the i686 failure. If you have an amd64 install of debian, install the i686 cross-compiler:
sudo apt install g++-i686-linux-gnu
And then you can compile the offending file as noted in the bug report. But tell it to use the i686 compiler. I see this:
@.***:~/debianstuff/gtsam$ i686-linux-gnu-g++ \
-DBOOST_ALL_NO_LIB \
-DBOOST_ATOMIC_DYN_LINK \
-DBOOST_CHRONO_DYN_LINK \
-DBOOST_DATE_TIME_DYN_LINK \
-DBOOST_FILESYSTEM_DYN_LINK \
-DBOOST_PROGRAM_OPTIONS_DYN_LINK \
-DBOOST_REGEX_DYN_LINK \
-DBOOST_SERIALIZATION_DYN_LINK \
-DBOOST_SYSTEM_DYN_LINK \
-DBOOST_THREAD_DYN_LINK \
-DBOOST_TIMER_DYN_LINK \
-DNDEBUG \
-I"." \
-I"CppUnitLite" \
-isystem /usr/include/eigen3 \
-fstack-protector-strong \
-Wformat \
-Werror=format-security \
-Wno-deprecated-declarations \
-Wdate-time \
-D_FORTIFY_SOURCE=2 \
-DNDEBUG \
-Wall \
-fPIC \
-Wreturn-local-addr \
-Werror=return-local-addr \
-Wreturn-type \
-Werror=return-type \
-Wformat \
-Werror=format-security \
-Wsuggest-override \
-Wno-unused-local-typedefs \
-o /tmp/tst.o \
-c "gtsam/linear/tests/testSparseEigen.cpp"
In file included from ./gtsam/base/FastSet.h:28,
from ./gtsam/inference/Key.h:22,
from ./gtsam/inference/Ordering.h:23,
from ./gtsam/inference/EliminateableFactorGraph.h:26,
from ./gtsam/linear/GaussianFactorGraph.h:24,
from gtsam/linear/tests/testSparseEigen.cpp:21:
./gtsam/base/Testable.h: In instantiation of 'bool gtsam::assert_equal(const V&, const V&, double) [with V = int]':
gtsam/linear/tests/testSparseEigen.cpp:44:3: required from here
./gtsam/base/Testable.h:99:26: error: incomplete type 'gtsam::traits~~~^~~~~~~~
./gtsam/base/Testable.h:102:21: error: incomplete type 'gtsam::traits~~~~^~~~~~~~
./gtsam/base/Testable.h:103:21: error: incomplete type 'gtsam::traits~~~~^~~~~~~~
Also, do you also build the Python wrapper? Currently I think they need to be built together.
I do, yes. This creates other problems, but I have workarounds. Once all the things build on all the arches, I'll submit more bug reports to hopefully ingest the fixes into your tree.
@dkogan I think you only need to change the EXPECT(assert_equal())
to EXPECT_LONGS_EQUAL()
Thanks much. Appears to work. Doing another upload; let's see how it does...
OK, this issue appears to be fixed, but now the tests fail on armhf: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=armhf&ver=4.2%7E9%2Bdfsg-5&stamp=1692186176&raw=0
The other arches still have problems too: https://buildd.debian.org/status/package.php?p=gtsam
I'm going to have to come back to this later. If you want to work on some of these issues in the meantime, that would be great. The package tree is here: https://buildd.debian.org/status/package.php?p=gtsam
@dkogan I think we currently only support 64-bit platforms, is this supported on Debian?
Fan Jiang @.***> writes:
@dkogan I think we currently only support 64-bit platforms, is this supported on Debian?
What is "support"? There's no reason gtsam should be arch-specific: it's a tool to solve a math problem.
The 10 arches on top of the buildd page are the ones that must be functional for this to be included in Debian:
https://buildd.debian.org/status/package.php?p=gtsam
The build issues are varied. armel needs wants to be linked with -latomic. I asked for that in the build, but apparently only some parts of the build respected that setting. This is almost certainly a bug in the gtsam build system:
armhf has failing tests:
mips64el has some linker problem I haven't debugged yet:
mipsel also wants -latomic and apparently the debug symbols are too large to fit into its ELF segments:
And so on. None of these are inherent limitations in gtsam. It's all fixable, but will take some time. I'll get back to it eventually, but if you want to work on it in the meantime, you can.
Hi. It looks like the failure mode on i686 and armhf is the same one: the build completes, but the tests fail: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=i386&ver=4.2%7E9%2Bdfsg-5&stamp=1692266835&raw=0
Since you don't need any new hardware on i686, can I get you to look into that? You'll be far more effective at debugging this than I. Thanks.
@dkogan I can look at it this week
Tagging @jlblancoc for this is a Debian-related issue
First, thanks a lot @dkogan for finally being brave enough to move this forward! :-)
All these errors are caused by assumptions and/or never-happened-yet situations about the different sizes of "int", "long", "size_t" in those "uncommon" archs, it's not related directly with Debian packaging or flags per se...
PS: The comment above was for build errors.
For failing tests, I cannot see a clear reason from the buildd logs. In the past, reasons for tests to fail in other projects on "uncommon" archs have been:
make testFoo.valgrind
to help debugging that. Also, temporarily, @dkogan , it would be great to add valgrind to Build-Depends and running make check_valgrind
instead of make check
to see all the valgrind reports from the build farm. Or, if you have Debian Developer access to the build boxes (I don't), try that locally without involving a complete DD upload...#if
for those archs. It's still a long track ahead, but it's worth it! :+1:
Hi. Yes. You could run valgrind or use the Debian build infrastructure and so on. But let's start with i686. This doesn't require anything that we all don't have already, and should be very simple to reproduce. I don't have the cycles currently. Can one of you please look at it?
Thanks.
I think I know what is the issue, it's because we expect Key to be the same as Eigen::Index which is 32 bit in some archs. I would say now we have a failure case for #1522
Fan Jiang @.***> writes:
I think I know what is the issue, it's because we expect Key to be the same as Eigen::Index which is 32 bit in some archs. I would say now we have a failure case for #1522
That looks like a good candidate. Are the patches in that PR more or less done? Should I try applying them to the Debian builds?
Fan Jiang @.***> writes: I think I know what is the issue, it's because we expect Key to be the same as Eigen::Index which is 32 bit in some archs. I would say now we have a failure case for #1522 That looks like a good candidate. Are the patches in that PR more or less done? Should I try applying them to the Debian builds?
@dkogan Yes that would be a good test, let's apply and see what happens
The patches in #1522 fixed most of the test failures but not all. I just tried on armhf, and there's one failure still:
69/221 Test #69: testSymbolicFactorGraph ............***Failed 0.03 sec
Not equal:
expected:
: cliques: 4, variables: 6
expected:
- P( e0 l0 b0)
expected:
| - P( s0 | b0 l0)
expected:
| - P( t0 | e0 l0)
expected:
| - P( x0 | e0)
actual:
: cliques: 4, variables: 6
actual:
- P( e0 l0 b0)
actual:
| - P( t0 | e0 l0)
actual:
| - P( x0 | e0)
actual:
| - P( s0 | b0 l0)
./gtsam/symbolic/tests/testSymbolicFactorGraph.cpp:93: Failure: "assert_equal(asiaBayesTree, actual2)"
There were 1 failures
Yall should test this all yourselves I think. I would start with i686. I suggest using a Debian install on your native arch (presumably amd64), crossing to i686. This is trivial to set up, and works without emulation. Let me know if you need help.
If you can root-cause it and suggest a fix/PR that I think would be the most helpful. Might be some cross-platform non-determinism that we have not yet encountered.
This has a lot of gtsam-specific complexity, and you can debug and fix this much faster than I can. Building on i686 is trivial for you to do (as described above), and you should start there. Let me know if you need help setting this up.
This seems to be just an order issue in the comparison of Bayes tree, these two trees are equivalent. @dkogan I think you can mask this test in i386 first and we can work on a fix to the comparison
Fan Jiang @.***> writes:
This seems to be just an order issue in the comparison of Bayes tree, these two trees are equivalent. @dkogan I think you can mask this test in i386 first and we can work on a fix to the comparison
OK. So to be clear I should:
Yes? I'm certain there will be other issues, but this should get us closer.
Yes :)
Alright, I tried it. Build logs are here: https://buildd.debian.org/status/package.php?p=gtsam
Click on the red "Build-Attempted" to get the log for each particular arch. Notes:
The other arches either succeeded, or haven't been built yet.
Can somebody please take a look? Thanks
binder
in this case, but just perhaps due to the use of int
or other non-portable types (but perhaps I'm wrong, not familiar enough with gtsam python wrapper).Yep, seems like abuse of size_t
when it should be uint64_t
OK. Once some form of #1625 is merged, I'll pull that into my patch stack.
And now that all the builds are done, I can see that some other arches failed in different ways. I think I fixed what mips64el is complaining about. Can somebody please look at the test failures for hppa and x32:
@dkogan It's near conference deadline recently so no bandwidth for me, but after ICRA sub deadline I can look into the Python failures.
Fan Jiang @.***> writes:
@dkogan It's near conference deadline recently so no bandwidth for me, but after ICRA sub deadline I can look into the Python failures.
I'm not in a rush at all. Is the ICRA deadline coming? I should do something too...
Today our automated packaging broke because of the introduction of libcephes-gtsam.so into the development branch. As the name suggests, it's a private build of libcephes since no debian package exists. Didn't we get rid of a privately built libmetis because it would preclude gtsam from inclusion into debian?
Hi. If you're including more dependencies you didn't write, it would be great if you had a clear switch in the code to avoid using them.
Hi. I pushed updated the Debian package to the new 4.2.0 release. And the 32-bit architectures still fail tests. For i386: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=i386&ver=4.2.0%2Bdfsg-1&stamp=1704797861&raw=0
The first failure:
======================================================================
ERROR: test_VisualISAMExample (testEssentialMatrixConstraint.TestVisualISAMExample.test_VisualISAMExample)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/<<PKGBUILDDIR>>/obj-i686-linux-gnu/python/gtsam/tests/testEssentialMatrixConstraint.py", line 32, in test_VisualISAMExample
factor = EssentialMatrixConstraint(poseKey1, poseKey2, E, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
1. gtsam.gtsam.EssentialMatrixConstraint(key1: int, key2: int, measuredE: gtsam.gtsam.EssentialMatrix, model: gtsam.gtsam.noiseModel.Base)
Invoked with: 8646911284551352321, 8646911284551352322, R:
[
0.969061, 0.218325, 0.115126;
-0.196438, 0.964625, -0.175815;
-0.149438, 0.14776, 0.977668
]
d: : 0.333333
-0.666667
0.666667
, isotropic dim=5 sigma=0.25
This looks to me like it couldn't run the test code, not that it failed some threshold. Can one of you please take a look?
Thanks.
@varunagrawal Is libcephes a header-only library? If so maybe it does not need to be built at all. We can make all function inline and remove the lib.
@dkogan You are right, this error is potentially a wrong dynamic_cast. There is a recent issue #1685 about similar stuff. I can look into this error this week.
@varunagrawal Is libcephes a header-only library? If so maybe it does not need to be built at all. We can make all function inline and remove the lib.
Unfortunately it isn't and I don't have the bandwidth to go deep into updating it to be header-only (especially since there is no test suite).
@varunagrawal
/**
* @brief Compute the quantile function of the Chi-Squared distribution.
*
* The quantile function of the Chi-squared distribution is the quantile of
* the specific (inverse) incomplete Gamma distribution.
*
* @param dofs Degrees of freedom
* @param alpha Quantile value
* @return double
*/
double chi_squared_quantile(const double dofs, const double alpha) {
return 2 * igami(dofs / 2, alpha);
}
From a search this is the only place using cephes, is it correct?
Yes, at least for now.
I am not sure I understand what the issue with Cephes is here. @dkogan's latest comment seems to be referring to something else.
Just thinking about how can we make packaging easier :)
Sorry I think I somewhat hijacked this issue, which was about gtsam not building on i386/armhf. The point I'm raising about the introduction of libcephes is in so far relevant that it may go counter to @dkogan 's efforts to make gtsam an acceptable Debian package (which the present issue is also concerned with). It was my understanding that shipping binaries of other libraries along with gtsam is less than ideal, see the discussion related to libmetis. But again, sorry for taking this thread off-topic.
Just thinking about how can we make packaging easier :)
The two seem orthogonal? The reasoning for including cephes is to enable functionality (specifically of the GNC optimizer), so sacrificing functionality for the sake of packaging seems to be akin to getting a poor gift but really expensive wrapping paper. There is a single function call but behind the scenes, multiple files and functions are required to make it work.
That being said, if there was an easier way to enable GNC without Cephes, I would do it. Unfortunately, re-implementing code provided by Cephes in a manner that keeps up with the quality of GTSAM is out of the question for me since I have been working on it by myself without any help for many months, and I have other things on my plate. Unless someone is willing to step up and help with development, it feels a bit disingenuous to ask for changes to make others lives easier while simultaneously making mine harder.
Hi. I pushed gtsam into Debian, and I see (among other things) that it doesn't build on armhf and i386. Can one of you please take a look? It'll take yall much less time to detangle this than it would take me.
Here's the i386 build log: https://buildd.debian.org/status/fetch.php?pkg=gtsam&arch=i386&ver=4.2%7E9%2Bdfsg-4&stamp=1692035308&raw=0
The punchline is that when building
gtsam/linear/tests/testSparseEigen.cpp
this happens:It's reproducible standalone, without cmake like this:
Thanks much