EnzymeAD / Enzyme

High-performance automatic differentiation of LLVM and MLIR.
https://enzyme.mit.edu
Other
1.26k stars 106 forks source link

Incorrect/Non-existent Gradients when Differentiating OpenLB #737

Closed ZuseZ4 closed 2 years ago

ZuseZ4 commented 2 years ago

I tried to differentiate any of the following two functions using fwd and reverse mode. The solver is marked as inactive and I try to get the gradients of the control variable.

void Callable(std::shared_ptr<TestFlowSolverOptiAD<double>> solver,
              const std::vector<double> &control, double &res) {
  solver->parameters(names::Opti()).applyControl(control);
  solver->solve();
  res = solver->parameters(names::Results()).objective;
}

double Callable2(std::shared_ptr<TestFlowSolverOptiAD<double>> solver,
            const std::vector<double> &control) {
  solver->parameters(names::Opti()).applyControl(control);
  solver->solve();
  double res = solver->parameters(names::Results()).objective;
  return res;
}

For a given control vector [2.5, 2.5, 2.5] the function will return 0.204527, when initiating it to [3.5, 3.5, 3.5] it will return 0.552186. However, in all cases the shadow of control will not be incremented in reverse-mode / zero in fwd-mode.

If I now modify the last line in Callable2:

 return res * control[0];

and init the shaddows to zero, then it will increment the first entry by exactly 0.204527 or 0.552186 respectively. Did I miss anything or is it a bug (incorrect activity propagation maybe?).

ZuseZ4 commented 2 years ago

I am unable to construct the solver inside of the function as Enzyme will create a segfaulting Binary (even without any getline usage). I am unable to mark the solver as duplicated and provide an identical shadow (with somewhat unclear meaning), as it fails at binary runtime with Attempting to call an indirect active function whose runtime value is inactive

wsmoses commented 2 years ago

What is the class hierarchy of the solver? If there's a virtual method there I think you do need to mark the solver as duplicated and potentially use something like __enzyme_virtualreverse (for reverse mode) to generate the corresponding shadow virtual method table: https://github.com/EnzymeAD/Enzyme/blob/main/enzyme/test/Integration/ReverseMode/virtualshadow3.cpp

ZuseZ4 commented 2 years ago

Yes, the solver is a derived class, the complete hierarchy is ~5 levels deep. Functions are often virtual and overriden.

So applyControl is override. The solve call self is not virtual and defined at the highest level, however it calls multiple functions which are virtual. However I seem to only need to handle the solver object itself, not every single virtual function inside, right? So in my case marking the solver as dup was indeed correct, but rather passing the second solver directly I have to pass it trough __enzyme_virtualreverse before initializing it and passing it to the __enzyme_autodiff?

wsmoses commented 2 years ago

Yeah you need to initialize the shadow virtual method table, otherwise it will segfault or get the runtime error you see above, since the shadow will not have the corresponding shadow function pointer.

wsmoses commented 2 years ago

Relatedly, can you get a standalone version of this code on the compiler explorer?

ZuseZ4 commented 2 years ago

Ok, I was asking because I wasn't sure at which level I have to create vTables. Writing an explicit wrapper for the Callable function or for the (virtual) applyControl is easy, writing wrappers for each virtual function inside the solver->solve() call is not. If there is a reasonable way to get Enzyme + olb working I can reduce this example afterwards, but it will probably take me a day or two. However, given your the 3rd virtualRev testcase I wrote the following code:

struct Wrapper {
  std::shared_ptr<TestFlowSolverOptiAD<S>> solver;
  std::vector<S> control;
};
template <typename T> T __enzyme_virtualreverse(T);

extern S __enzyme_autodiff3(void *, int, Wrapper &, Wrapper &)
S Callable3(Wrapper &o) {
  o.solver->parameters(names::Opti()).applyControl(o.control);
  o.solver->solve();
  return o.solver->parameters(names::Results()).objective;
}

main() {
/// ....
      struct Wrapper m;
      m.solver = solver;
      m.control = control;
      //res = Callable3(m);

      struct Wrapper dm;
      dm.solver = solver2;
      dm.control = derivative1;
      *((void **)&dm) = __enzyme_virtualreverse(*(void **)&dm);
      dres = __enzyme_autodiff3((void *)Callable3, enzyme_dup, m, dm);
      // res = Callable3(dm);
}

it gives me

ld.lld: warning: <unknown>:0:0: Cannot create virtual version of non-constant value   %379 = call noundef i8* @_Z23__enzyme_virtualreverseIPvET_S1_(i8* noundef %378) #44  %378 = load i8*, i8** %377, align 8, !tbaa !5049
ld.lld: warning: <unknown>:0:0: Cannot create virtual version of non-constant value   %375 = call noundef i8* @_Z23__enzyme_virtualreverseIPvET_S1_(i8* noundef %374) #76  %374 = load i8*, i8** %373, align 8, !tbaa !5049
ld.lld: error: undefined symbol: void* __enzyme_virtualreverse<void*>(void*)
>>> referenced by ld-temp.o
>>>               lto.tmp:(main)
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:67: test] Error 1
make test  98.91s user 0.68s system 100% cpu 1:39.59 total

I get the same error when trying to virtualreverse the solver directly: *((void **)&solver2) = __enzyme_virtualreverse(*(void **)&solver2);

wsmoses commented 2 years ago

So the virtualreverse needs to apply to the actual data inside the pointer, and ideally where construction is visible so it can see the global table. So something like:

void Callable(TestFlowSolverOptiAD<double>& solver,
              const std::vector<double> &control, double &res) {
  solver.parameters(names::Opti()).applyControl(control);
  solver.solve();
  res = solver->parameters(names::Results()).objective;
}

main() {
     TestFlowSolverOptiAD<double> dsolver;
      *((void **)&dsolver) = __enzyme_virtualreverse(*(void **)&dsolver);
      dres = __enzyme_autodiff3((void *)Callable, enzyme_dup, solver, dsolver, control, dcontrol, res, dres);
      // res = Callable3(dm);
}
wsmoses commented 2 years ago

Also constructing the solver inside the function should be possible, and segfaulting in that case is weird and is presumably a bug.

ZuseZ4 commented 2 years ago

Yes, it's definitely a bug, most likely related to IO (the solver is created based on an xml config file). Unfortunately it does not seem to be the getline segfault bug, though I guess they will be related?

Also just adding a new constructor to OpenLB to see if the virtualreserve application directly helps, thanks!

ZuseZ4 commented 2 years ago
template <class SOLVER>
std::shared_ptr<SOLVER> createEnzymeLBSolver(XMLreader const &xml) {
  utilities::TypeIndexedTuple<typename SOLVER::BaseSolver::Parameters_t>
      paramsTuple;
  *((void **)&paramsTuple) = __enzyme_virtualreverse(*(void **)&paramsTuple);
  meta::tuple_for_each(paramsTuple.tuple, [&](auto &params) {
    using parameter_type =
        typename std::remove_reference_t<decltype(params)>::element_type;
    //*((void **)&params) = __enzyme_virtualreverse(*(void **)&params);
    params = createParameters<parameter_type>(xml);
  });
  auto result = std::make_shared<SOLVER>(paramsTuple);
  return result;
}

Using the code above to create the dsolver (and the unmodified version without virtualreverse for the solver) does compile without warning, but fails at runtime: Attempting to call an indirect active function whose runtime value is inactive

I also tried to add two new constructer which return the solver and the virtualreverse dsolver directly, instead of the shared_ptr.

    1 template <class SOLVER>
    2 SOLVER createEnzymeLBSolverDirect(XMLreader const &xml) {
    3   utilities::TypeIndexedTuple<typename SOLVER::BaseSolver::Parameters_t>
    4       paramsTuple;
    5   *((void **)&paramsTuple) = __enzyme_virtualreverse(*(void **)&paramsTuple);
    6   meta::tuple_for_each(paramsTuple.tuple, [&](auto &params) {
    7     using parameter_type =
    8         typename std::remove_reference_t<decltype(params)>::element_type;
    9     //*((void **)&params) = __enzyme_virtualreverse(*(void **)&params);
   10     params = createParameters<parameter_type>(xml);
   11   });
   12   return paramsTuple;
   13 }
   // ...

    1     TestFlowSolverOptiAD<S> solver =
    2         createLBSolverDirect<TestFlowSolverOptiAD<S>>(config);
    3     TestFlowSolverOptiAD<S> dsolver =
    4         createEnzymeLBSolverDirect<TestFlowSolverOptiAD<S>>(config);
    5     dres = __enzyme_autodiff4((void *)Callable4, enzyme_dup, solver, dsolver, enzyme_dup, control, derivatives1);

unfortunately it gives me the same error.

Shall I try to move the virtualreverse up into the TypedIndexTuple or even higher?

template <typename MAP>
struct TypeIndexedTuple {
  template <typename... TYPES>
  using SharedPointerTuple = std::tuple<std::shared_ptr<TYPES>...>;

  /// An std::tuple with shared_ptrs to the values of MAP as types.
  using tuple_t = typename MAP::values_t::template decompose_into<SharedPointerTuple>;
  tuple_t tuple;

  /// Access by KEYS of MAP
  template <typename KEY>
  constexpr auto& get() {
    return std::get<(MAP::keys_t::template index<KEY>())>(tuple);
  }
  /// Access by KEYS of MAP
  template <typename KEY>
  constexpr auto const& get() const {
    return std::get<(MAP::keys_t::template index<KEY>())>(tuple);
  }
};

I'll also try to pre-read the xml outside of the Callable function but construct the rest of the object inside and just add a setter, so hopefully the IO segfault will not be triggered and Enzyme can handle the virtual functions automatically.

ZuseZ4 commented 2 years ago

I managed to move the IO out of the constructor, so the segfaulting behaviour shouldn't occur. This time I also construct the solver inside of the function being differentiated, so Enzyme should take care of virtual functions. Unfortunately it now again fails to compile:


/home/drehwald/prog/llvm-14/build/bin/clang++ -O3 -Wall -march=native -mtune=native         -std=c++17 -DPLATFORM_CPU_SISD -fuse-ld=lld -flto testFlowSolver.o -L../../../external/lib -lz -ltinyxml -fuse-ld=lld -flto -o tFS_t -Wl,-lz -Wl,-ltinyxml -Wl,--lto-legacy-pass-manager  -Wl,-mllvm=-load=/home/drehwald/prog/Enzyme/enzyme/build/Enzyme/LLDEnzyme-14.so
ld.lld: warning: <unknown>:0:0: Cannot cast __enzyme_autodiff primal argument 6, found   %326 = phi i32 [ %309, %307 ], [ %309, %319 ], [ %324, %322 ], type i32 - to arg 2 %"class.std::shared_ptr.185"*
ld.lld: error: <unknown>:0:0: EnzymeFailure when replacing __enzyme_autodiff calls in _Z19testSensitivitiesADv
ld.lld: warning: <unknown>:0:0: Cannot cast __enzyme_autodiff primal argument 6, found   %394 = phi i32 [ %377, %375 ], [ %377, %387 ], [ %392, %390 ], type i32 - to arg 2 %"class.std::shared_ptr.185"*
ld.lld: error: <unknown>:0:0: EnzymeFailure when replacing __enzyme_autodiff calls in main
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:68: test] Error 1
ZuseZ4 commented 2 years ago

#include "testFlowSolver.h"

using namespace olb;
using namespace olb::parameters;

const double eps(1.e-7);
using S = double;
constexpr unsigned numberOfDerivatives (3);
using T = util::ADf<S,numberOfDerivatives>;

using std::cout;
extern int enzyme_const;
extern int enzyme_dup;

using OUT = std::shared_ptr<parameters::OutputBase<S>>;
using OPTI = std::shared_ptr<TestFlowDirectOpti<S>>;
using RES = std::shared_ptr<parameters::DirectOptiResults<S>>;
using SIM = std::shared_ptr<parameters::XmlSimulation<S, Lattices>>;

std::shared_ptr<TestFlowSolverOptiAD<S>>
createLBSolver3(const OPTI opti, const OUT out, const RES res, const SIM sim) {
  utilities::TypeIndexedTuple<Params_TfOptiAd<S>> params;
  params.template get<Opti>() = opti;
  params.template get<Output>() = out;
  params.template get<Results>() = res;
  params.template get<Simulation>() = sim;
  auto result = std::make_shared<TestFlowSolverOptiAD<S>>(params);

  return result;
}

S __enzyme_autodiff(void *, int, const std::vector<S> &, std::vector<S> &, int,
                    const OPTI, int, const OUT, int, RES, int, const SIM);

S Callable5(const std::vector<S> &control, const OPTI opti, const OUT out,
            const RES res, const SIM sim) {
  auto solver = createLBSolver3(opti, out, res, sim);
  solver->parameters(names::Opti()).applyControl(control);
  solver->solve();
  S returnVal = solver->parameters(names::Results()).objective;
  return returnVal;
}

  void testSensitivitiesAD() {
    OstreamManager clout(std::cout, "sensitivitiesAD");

    std::vector<S> derivatives1{0.0, 0.0, 0.0};
    std::vector<S> derivatives2(numberOfDerivatives, 0.);

    XMLreader config("parameter.xml");
    OPTI opti = createParameters<TestFlowDirectOpti<S>>(config);
    OUT out = createParameters<parameters::OutputBase<S>>(config);
    RES res = createParameters<parameters::DirectOptiResults<S>>(config);
    SIM sim = createParameters<parameters::XmlSimulation<S, Lattices>>(config);
    {
      // 2.5 gives: 0.204527
      // 3.5 gives: 0.552186
      std::vector<S> control(numberOfDerivatives, 2.5);
      Callable5(control, opti, out, res, sim);
      S dres = 0;
      dres = __enzyme_autodiff((void *)Callable5, enzyme_dup, control,
                               derivatives1, enzyme_const, opti, enzyme_const,
                               out, enzyme_const, res, enzyme_const, sim);
      cout << "res: " << res << " dres: " << dres
           << " derivatives = " << derivatives1[0] << ", " << derivatives1[1]
           << ", " << derivatives1[2] << std::endl
           << "done running Enzyme now" << std::endl;
    }
   /// ...
}
wsmoses commented 2 years ago

Can you make a version of this without external file dependencies to take a look at?

Regardless that error means you are calling autodiff incorrectly (and is saying at parameter X it expected a value of type shared ptr but found one of i64)

ZuseZ4 commented 2 years ago

I think rather than calling it incorrectly, Enzyme here has some incorrect expectations? Afaik user selections (enzyme_const, enzyme_dup, ...) should allways overwrite default selections (or possibly panic at compile time if they don't make sense). This error looks as if this wouldn't happen and Enzyme isn't treating one of the args as const (thus expecting a shadow instead of the next integer). By shuffling arg orders around I also managed to have Enzyme complain about a vector vs. int mismatch in the first args. Will start to minimize it.

wsmoses commented 2 years ago

Trying to keep each issue to a distinct bug, and considering the final calling convention piece was closed by a PR from you, I'm going to close this. Please open a new issue with a new bug if odd behavior persists.