borglab / GTDynamics

Full kinodynamics constraints for arbitrary robot configurations with factor graphs.
BSD 2-Clause "Simplified" License
39 stars 10 forks source link

Cannot construct `JacobianFactor` in Python #248

Open gchenfc opened 3 years ago

gchenfc commented 3 years ago

Description

I believe something is wrong with constructing JacobianFactors from the result of a linearize() operation when using the python wrapper. Tagging @varunagrawal and @ProfFan due to the fact this is python related.

Specifically, the RTTI dynamic casts used in JacobianFactor constructors do not seem to be working properly. Here are two example usages:

https://github.com/borglab/gtsam/blob/ff7ddf48bd8bb2f993437f01cd32a4727d6f023b/gtsam/linear/JacobianFactor.cpp#L58-L67

https://github.com/borglab/gtsam/blob/ff7ddf48bd8bb2f993437f01cd32a4727d6f023b/gtsam/linear/JacobianFactor.cpp#L196-L211

For whatever reason, nonlinear factors constructed in python, which are then called with .linearize() to create a JacobianFactor, cannot be passed into the JacobianFactor(factor) constructors because boost::dynamic_pointer_cast<JacobianFactor>(factor) returns a null pointer.

Steps to reproduce

This example python script fails:

import gtsam
import gtdynamics as gtd

def main():
    f = gtd.MinTorqueFactor(0, gtsam.noiseModel.Unit.Create(1))

    init = gtsam.Values()
    init.insert(0, 0.0)

    flin = f.linearize(init)
    flin.print("Linearized Factor: ")
    print("\nLinearized factor has type: ", type(flin), '\n')
    # output: Linearized factor has type:  <class 'gtsam.gtsam.JacobianFactor'>

    gtsam.JacobianFactor(flin)  # this throws an error

if __name__ == '__main__':
    main()

Which throws the error:

In JacobianFactor(const GaussianFactor& rhs), rhs is neither a JacobianFactor nor a HessianFactor

Expected behavior

The python result should match that of this example C++ script succeeds:

#include <gtdynamics/factors/MinTorqueFactor.h>

#include <gtsam/nonlinear/NonlinearFactorGraph.h>

#include <iostream>

using namespace gtsam;
using namespace gtdynamics;

int main() {
  auto f = MinTorqueFactor(0, noiseModel::Unit::Create(1));

  auto init = Values();
  init.insert(0, 0.0);

  auto flin = f.linearize(init);
  flin->print("Linearized Factor: ");
  std::cout << "\nLinearized factor has type: " << typeid(*flin).name() << std::endl<< std::endl;
  // Output: Linearized factor has type: N5gtsam14JacobianFactorE

  auto jf = JacobianFactor(*flin);
}

Environment

MacOS 11.3.1, MacBook Air (M1, 2020), AppleClang 12.0.5.12050022 gtsam hash b99bf4e92912f4ad0020f79d6b222a3d6593514f

I also experienced similar errors when I ran on an ubuntu intel lab machine, but I didn't rigorously test these particular test files.

Additional information

This only occurs for nonlinear factors defined outside of the GTSAM repo. So, for example, PriorFactor, BetweenFactor, etc all still work as expected.

This causes noticeable failures on optimizations with Constrained noise models, may be causing silent correctness errors on other optimizations, and is likely causing all non-GTSAM factors to be calling the HessianFactor constructor instead of the JacobianFactor constructor (performance penalty).

ProfFan commented 3 years ago

I hate C++ RTTI, we should adopt a LLVM-styled RTTI some day...

*No capacity to work on this for now, need to get ICRA ddl done

gchenfc commented 3 years ago

Temporarily using this workaround to use name-based type matching:

JacobianFactor::JacobianFactor(const GaussianFactor& gf) {
  // Copy the matrix data depending on what type of factor we're copying from
  if (!strcmp(typeid(gf).name(), typeid(JacobianFactor).name()))
    *this = JacobianFactor(*static_cast<const JacobianFactor*>(&gf));
  else if (!strcmp(typeid(gf).name(), typeid(HessianFactor).name()))
    *this = JacobianFactor(*static_cast<const HessianFactor*>(&gf));
  else
    throw std::invalid_argument(
        "In JacobianFactor(const GaussianFactor& rhs), rhs is neither a JacobianFactor nor a HessianFactor");
}

And do a similar thing in _convertOrCastToJacobians. See https://github.com/borglab/gtsam/commit/bb393460d936eb69a1bf6d4eb3c2531416df4717

Note: for constrained optimization, you still need to manually set the linearSolverType param to QR - for some reason GTSAM is also not properly recognizing that the constrained noise models and auto-switching to QR. e.g.

params = gtsam.LevenbergMarquardtParams()
params.setLinearSolverType("MULTIFRONTAL_QR")
result = gtsam.LevenbergMarquardtOptimizer(graph, init, params).optimize()

I'm leaving this open because it needs to eventually be fixed, but this workaround works for now.

varunagrawal commented 3 years ago

Yeah we'll figure it out after ICRA :)

gchenfc commented 2 years ago

Just checking, does anyone else get this error? For me just compiling everything standard and running make python-test in gtdynamics produces this issue in the Simulator. Alternatively, minimum failing example (copy-pasted from above):

import gtsam
import gtdynamics as gtd

def main():
    f = gtd.MinTorqueFactor(0, gtsam.noiseModel.Unit.Create(1))

    init = gtsam.Values()
    init.insert(0, 0.0)

    flin = f.linearize(init)
    flin.print("Linearized Factor: ")
    print("\nLinearized factor has type: ", type(flin), '\n')
    # output: Linearized factor has type:  <class 'gtsam.gtsam.JacobianFactor'>

    gtsam.JacobianFactor(flin)  # this throws an error

if __name__ == '__main__':
    main()

@varunagrawal @yetongumich @ProfFan

ProfFan commented 2 years ago

I think I already fixed this in GTSAM, maybe the GTD wrapper does not have such changes applied.

Search for RTTI in the GTSAM Issues

gchenfc commented 2 years ago

Are you referring to borglab/gtsam#218 ?

If so, it looks like -fvisibility-ms-compat flag is already set inside function pybind_wrap

But this definitely sounds like the right issue (Mac dynamic linking related)

ProfFan commented 2 years ago

AFAIK I am suspecting that we have some violation of ODR such that multiple copies of the same class is generated. Have no proof, the only way to be sure is adding prints in the JacobianFactor constructor and print out the typeinfo struct.

gchenfc commented 2 years ago

If that were the case then wouldn't this be affecting all systems and not just mac?

ProfFan commented 2 years ago

No, this linking fiasco only happens with macOS... I don't exactly know why, but the visibility of the symbols of dynamic libs on macOS seems to be the culprit.

I also think it would be really helpful if you can check if the GTD lib and GTSAM is both generating typeinfo for JacobianFactor.

gchenfc commented 2 years ago

Sorry for forgetting about this for a long time.

Adding debug prints

In JacobianFactor constructor, I added:

std::cout << "Constructing a Jacobian factor from a Gaussian Factor!!\n"  //
          << "typeid of JacobianFactor is " << typeid(JacobianFactor).name()
          << '\t' << typeid(JacobianFactor).hash_code() << "\n"   //
          << "typeid of input argument is " << typeid(gf).name()  //
          << '\t' << typeid(gf).hash_code() << std::endl;

Then in GTDynamics MinTorqueFactor construtor, I added

std::cout << "Constructing a MinTorqueFactor..." << std::endl;
std::cout << "typeid of gtdynamics' JacobianFactor is "
          << typeid(gtsam::JacobianFactor).name() << '\t'
          << typeid(gtsam::JacobianFactor).hash_code() << std::endl;

Test code

The test code in c++:

auto values = gtsam::Values();
values.insert(0, 0.0);
{
  auto f = gtsam::PriorFactor(0, 0.0, gtsam::noiseModel::Unit::Create(1));
  auto gaussian_factor = f.linearize(values);
  gtsam::JacobianFactor jacobian_factor(*gaussian_factor);
}
{
  auto f = gtdynamics::MinTorqueFactor(0, gtsam::noiseModel::Unit::Create(1));
  auto gaussian_factor = f.linearize(values);
  gtsam::JacobianFactor jacobian_factor(*gaussian_factor);
}

and in python:

values = gtsam.Values()
values.insert(0, 0.0)

f = gtsam.PriorFactorDouble(0, 0.0, gtsam.noiseModel.Unit.Create(1))
gaussian_factor = f.linearize(values)
jacobian_factor = gtsam.JacobianFactor(gaussian_factor)

f = gtd.MinTorqueFactor(0, gtsam.noiseModel.Unit.Create(1))
gaussian_factor = f.linearize(values)
jacobian_factor = gtsam.JacobianFactor(gaussian_factor)  # this throws an error

Results

c++ results:

Constructing a Jacobian factor from a Gaussian Factor!!
typeid of JacobianFactor is N5gtsam14JacobianFactorE    4351243596
typeid of input argument is N5gtsam14JacobianFactorE    4351243596
Constructing a MinTorqueFactor...
typeid of gtdynamics' JacobianFactor is N5gtsam14JacobianFactorE        4351243596
Constructing a Jacobian factor from a Gaussian Factor!!
typeid of JacobianFactor is N5gtsam14JacobianFactorE    4351243596
typeid of input argument is N5gtsam14JacobianFactorE    4351243596

python results:

Constructing a Jacobian factor from a Gaussian Factor!!
typeid of JacobianFactor is N5gtsam14JacobianFactorE    4398406401
typeid of input argument is N5gtsam14JacobianFactorE    4398406401
Constructing a MinTorqueFactor...
typeid of gtdynamics' JacobianFactor is N5gtsam14JacobianFactorE        4796888396
Constructing a Jacobian factor from a Gaussian Factor!!
typeid of JacobianFactor is N5gtsam14JacobianFactorE    4398406401
typeid of input argument is N5gtsam14JacobianFactorE    4796888396
Traceback (most recent call last):
  File "/Users/gerry/GIT_REPOS/GTDynamics/gtdynamics/cablerobot/src/test.py", line 22, in <module>
    main()
  File "/Users/gerry/GIT_REPOS/GTDynamics/gtdynamics/cablerobot/src/test.py", line 14, in main
    jacobian_factor = gtsam.JacobianFactor(gaussian_factor)  # this throws an error
ValueError: In JacobianFactor(const GaussianFactor& rhs), rhs is neither a JacobianFactor nor a HessianFactor

So I think this is an indication that the python wrapper is using a different version of gtdynamics, so maybe we just need GTSAM_EXPORT in gtdynamics?

Going to continue working on this.

gchenfc commented 2 years ago

I figured out the problem: my gtdynamics.cpython-39-darwin.so and gtsam.cpython-39-darwin.so files are linking to different places. Specifically:

(found using otool -l)

Note that also gtdynamics.cpython-39-darwin.so is using @rpath/libgtdynamicsDebug.dylib, but rpath is set to:

Load command 26
          cmd LC_RPATH
      cmdsize 64
         path /Users/gerry/GIT_REPOS/GTDynamics/build/gtdynamics (offset 12)
Load command 27
          cmd LC_RPATH
      cmdsize 32
         path /Users/gerry/lib (offset 12)

so actually I think it's also going to be linking to the "wrong" file.

(the -fvisibility-ms-compat was not the issue - linking to the correct executable fixes the problem)


I don't know if this is specific to my machine or not. Would someone else mind pasting their output of:

otool -l /Users/gerry/miniforge3/lib/python3.9/site-packages/gtdynamics-1.0.0-py3.9.egg/gtdynamics/gtdynamics.cpython-39-darwin.so
otool -l /Users/gerry/miniforge3/lib/python3.9/site-packages/gtsam-4.1.0-py3.9.egg/gtsam/gtsam.cpython-39-darwin.so

(replacing /Users/gerry/miniforge3 with wherever your python packages are; the folder can be found using python -c 'import gtsam; print(gtsam.__path__)')

??? @varunagrawal @ProfFan