Encountered failed assertion: static_var.h:117

Hi, I got a failed assertion recently with the following error message: buildit/include/builder/static_var.h:117: builder::static_var<T>::~static_var() [with T = double]: Assertion builder_context::current_builder_context->static_var_tuples.back().ptr == (unsigned char *)&val' failed.

I probably made a mistake somewhere in my code; is there a particular situation in which this assertion is triggered? Would be super helpful to know what to look for when I go bug hunting.

Thank you!

Seems this assertion is triggered when static_var objects are not destructed in the reverse order as they were created. In my case, this happened because I created a static_var object that went out of scope / ended up getting created only as a temporary, unnamed rvalue.

Please correct me if I've misunderstood something!

Diagnosed exactly why this is happening. I don't have a solution unfortunately, so I'd still appreciate the help!

I'm trying to get BuildIt to play well with Eigen, a standard linalg library for roboticists and graphics people. Long story short, destructor ordering for static_var<T> seems to be important. A bad copy constructor for my custom type is constructing dangling temporary static_vars, that don't destruct in the order required by BuildIt.

My question: how do I replace the copy constructor with just a move? Or write it better? This copy is generated during a return statement, and should ideally be elided but doesn't for some reason...

Some background on my usecase and Eigen internals follows.

I'm trying to generate code from this minimal input code snippet:

void foo() {
  // cscalar is our custom scalar type using BuildIt types, more on this later 
  // 2x2 matrices with cscalar<double> entries
  Eigen::Matrix<cscalar<double>, 2, 2> a, b, c;
  a >>
    1, 0,
    0, 5;
  b >>
    2, 3, 
    4, 7;
  c = a + b;
}

Here's the intended codegen'd output:

void foo() {
  // unrolled matrix a
  double var0 = 1, var1 = 0, var2 = 0, var3 = 5; 
  // unrolled matrix b
  double var4 = 2, var5 = 4, var6 = 3, var7 = 7;
  // unrolled matrix c
  double var8, var9, var10, var11;
  // c = a + b
  var8 = 1 + 2;
  var9 = 0 + 4;
  var10 = 0 + 3;
  var11 = 5 + 7;
}

This is what cscalar looks like:

template <typename T>
struct cscalar {
  dyn_var<T> m_value;
  static_var<T> is_constant;
  static_var<T> constant_val;

  // constructors
  cscalar() {...}
  cscalar(T value) {...}
  cscalar(dyn_var<T>& dyn_val) {...}
  cscalar(const cscalar& other) {...} // copy constructor: this is the problematic bit

  // Eigen requires the inputs and outputs for overloaded arithmetic ops to be the same scalar type.
  const cscalar& operator+(const cscalar& other) const {...} // Line 1, a new cscalar is constructed inside this and returned

  // assignment operators
  void operator=(const T& value) {...}
  void operator=(const cscalar& other) {...}
};

Here's pseudocode on roughly how arith ops are performed in Eigen (Scalar = cscalar<double> for us); I cannot modify this code:

Scalar add_op(Scalar& a, Scalar& b) { return a + b; } // Line 2

void assign_op(Scalar&a, Scalar& b) { a = b; } // Line 3

void assignCoeff(Scalar& dst, Scalar&a, Scalar& b) {
  assign_op(dst, add_op(a, b)); // Line 4
}

Here's the order of operations:

Line 4: assignCoeff -> Line 2: add_op
Line 1: new cscalar tmp1 constructed inside cscalar::operator+
Line 2: new cscalar tmp2 generated from copy constructor during return (no copy elision happened!)
Line 4 -> Line 3: assign_op
Line 4 concludes and add_op(a, b), i.e., tmp2, goes out of scope and is destructed.
tmp1 was never destructed! The assert inside BuildIt is then triggered when any static_var constructed before tmp1, is destructed.

What's the best way to make sure tmp1 gets destructed before constructing tmp2? Or ideally avoid the copy at all by using a move constructor, and moving the dyn_var and static_var member variables correctly into the copy? If I delete cscalar& other inside the copy constructor after constructing tmp2, it violates BuildIt's destructor order as well.

Hi Radhika, Thanks for creating this issue. Apologies about the fact that I didn't notice it before. This is definitely an interesting case and I am aware that the design of static_var's behavior causes the bug assertion to fail here. The high-level answer is the specific design is to optimize the implementation. We are very easily relax the destructor order requirement, but I would like to make sure that is definitely necessary for your use case.

Let's start with why the destructor order is important. The main reason static_var template is required is to keep track of mutations to first stage variables to determine whether two executions can be merged. Every time an operation happens on dyn_var BuildIt takes a snapshot of all alive static_var's and creates a string to record the state. This string is later compared to identify merging opportunities. To keep track of alive static_var's BuildIt pushes the address of the variable to a vector (attached to the context) and pops it later when the static var is destroyed. This simple design allows us to keep track of which static var is which without maintaining a key value pair. This not only makes the storage efficient, it also makes the comparisons of snapshots easier.

Now, like I said, we can easily relax this design if necessary by replacing the vector with a map and selectively removing elements from the map. But let us make sure that is necessary.

If I understand correctly, in your example, a temporary cscalar is created which is copied into another cscalar. But the temporary is never destroyed. BuildIt aside this most likely happens when rule of (3/5/0)[https://en.cppreference.com/w/cpp/language/rule_of_three] is violated. This might be the case in your example because you have a copy constructor and a default constructor but no destructor. Unless, some of these operators allocate the object on the heap with new and doesn't explicitly delete it? I do see some issues with the code snippet that you have shared in that your + operator returns a const cscalar& which is almost always invalid unless you heap allocate a new object and return it or return one of the operands updating them in place.

There might be several ways to approach from here, but let us start from the simplest option -

Do is_constant and constant_val really need to be static? -- this is something I briefly mentioned during the tutorial but didn't go too deep into, which is not everything needs to be marked static, esp. if the value doesn't change during the execution. Do you anticipate these members to start out being false and then becoming true during the execution? If not, we can just make them usual c++ variables and it would be fine.
The second option is to modify the constructors/destructors for your cscalar class such that static_var destructor order is properly managed. I tried to spin up a POC for the code snippet you shared but I am running into compilation issues with the block you mentioned you cannot modify. Specifically (snippet)[https://buildit.so/tryit/?sample=shared&pid=ada8e2ad55897451e016ffe250677b07] -

Scalar add_op(Scalar& a, Scalar& b) { return a + b; } // Line 2

void assign_op(Scalar&a, Scalar& b) { a = b; } // Line 3

void assignCoeff(Scalar& dst, Scalar&a, Scalar& b) {
  assign_op(dst, add_op(a, b)); // Line 4
}

Wouldn't compile because add_op returns a Scalar (which in this case is a prvalue) and assign_op expects an lvalue as its second parameter. Is this exactly the code you are trying or is this a simplified example. If you can share a working example that repros the static_var assertion I can take a stab at fixing the order of cosntructor/destructors. You don't necessarily have to share a "buildit.so" link, if you have a repo, you can link/share it, I can build and figure it out.

The third option is to copy the static_var when a cscalar object is moved from. Currently static_vars don't have a move constructor. This is specifically for the reason that efficiently moving the state would also require us to reorder the vector of static variables which makes thing complicate. The easiest solution might be to just have a move constructor for cscalar that doesn't actually move from the static_var inside but just copies it like a normal scalar/int. Once again, we would have to ensure that all variables are indeed destroyed.
Relax BuildIt's order of destructor requirement. - This shouldn't be that difficult. The easiest way would be to allow out of order destructors and when a static var is destroyed out of order, just set it's size to be 0 instead of removing it and popping it later when everything after it is removed. I can make a change to BuildIt (maybe under a flag) to implement this behavior if we need to fallback to it.

I think we should consider each of the above options in order. Maybe the most useful starting point right now would be to answer 1 and share a working example that compiles -> runs -> crashes on assert. I can then debug and figure out a way to fix the order of constructors/destructors.

I hope that makes sense. I am really curious to see how this experiment with Eigen goes. It should definitely be a very useful application for BuildIt.

Thanks

Hi Ajay, no worries, and thanks for your explanation of why destructor order for static_vars matters! Here's the repo I've been working in, please use the radhika-dev branch.

Relevant files are:

To repro the assertion:

$ make -j$(nproc)
$ ./build/sample1
I realized even the basic addition example is broken with a dangling (a+b) rvalue: $ ./build/sample4. This leads me to believe my constructors/destructors are off.

Addressing some of your comments/concerns:

Do is_constant and constant_val really need to be static?

Now that I think about it, in this minimal case, is_constant and constant_val don't need to be static_vars because their values do not change during execution. This is because we generate a new cscalar every time we call an arith operator, and no existing cscalar is modified in place. I have left the non-static_var version commented in the code. When I use the non-static_var version, BuildIt terminates early for sample1.cpp with an empty while loop.

...unless you heap allocate a new object and return it

Yep this is what I'm doing! This is also what is causing the dangling pointer, since this object is never free'd.

You might ask why am I returning a cscalar instead of just returning an expression, much like in the barray tutorial language.

I do have the alternate version implemented which returns exprs instead (you can compile it by setting EXPR_RETURN=1 in Makefile.inc). I have a typecast operator defined for converting cscalar_exprs to cscalars, so Eigen doesn't get unhappy. (since Eigen insists all input and output types be the same scalar)

Question 2: If you run ./build/sample1 using the alternate implementation, you'll find that BuildIt terminates early after hitting a LoopBackException.

After some hunting I saw array.h in BuildIt's source, and figured there must be some quirk in handling arrays inside BuildIt. (link to Eigen's internal array allocator). Could you tell me a bit about arrays of BuildIt types? I am quite puzzled about this too... (and I'd use this implementation had the early termination situation not happened)

Is this exactly the code you are trying or is this a simplified example

This is a simplified snippet of what Eigen does (links from Eigen's codebase: Line 2, Line 3, Line 4). Seems the simplified example is missing important details! I can dig up the exact template instantiation and get back to you.

The second option is to modify the constructors/destructors for your cscalar class

Seeing your explanation, pretty sure this is the problem; I likely got sth wrong with the constructors/destructors for my class. Will try to poke at it; getting sample4 running is a good first step.

Thanks a lot for your help! (and for hand-holding me through the basics of C++ lol)

Could you tell me a bit about when/why do LoopBack exceptions happen in BuildIt, specifically when using BuildIt types in arrays?

Hey Radhika, I am taking a closer look at the eigen implementation that you have. Let me get back to you about both these points by tomorrow. Thanks!

Hi Radhika, I took a closer look at the repo. Let me get through each point one by one.

Now that I think about it, in this minimal case, is_constant and constant_val don't need to be static_vars because their values do not change during execution. This is because we generate a new cscalar every time we call an arith operator, and no existing cscalar is modified in place.

If you plan is to never modify a scalar, I think this would be the best solution going forward. Even if we get static_var to work how you want, I am guessing this implementation would produce really large number of static vars (one per element in the matrix?). static_vars do come with a little overhead because each operation on the dynamic variables have to snapshot all static vars and compare them with other states. This can be fairly expensive for large matrices. One way to ensure you are never actually modifying any cscalars might be to delete the operator= explicitly. So the only way to create new values would be through the copy constructor and other operations, which can always create new is_constant/constant_val instead of modifying it in place.

I have left the non-static_var version commented in the code. When I use the non-static_var version, BuildIt terminates early for sample1.cpp with an empty while loop.

So this is because of the way static_vars interact with dyn_vars. Let us first understand what a LoopBackException is (your follow up question). You can see that LoopBackException is thrown exactly in one place in BuildIt here. Basically whenever BuildIt creates a new statement in the generated code, it checks if an exact same statement with the same static tag has been created before. If it has been created, it means the execution it going to be exactly the same. Remember static tags are a concatenation of where you are in the program and the state of all the static vars. When this exception is thrown, the execution of the program stops and BuildIt simply inserts a goto (back edge) in the generated code. So now, if a static var that should be tracked is not tracked, it will lead BuildIt to believe that it is safe to go back and it will insert a backedge which essentially becomes a while(1). This is a 100% symptom of missing something that should be static but isn't marked static.

...unless you heap allocate a new object and return it

Yep this is what I'm doing! This is also what is causing the dangling pointer, since this object is never free'd.

This is generally not a good idea for C++ since you would later have no way to free the returned value. The preferred way is to usually return by value and let the compiler either do ROV or aggressive inlining to optimize it. Anyway, returning *new is not going to be great for performance anyway. What does Eigen usually do in such cases for the usual scalars?

You might ask why am I returning a cscalar instead of just returning an expression, much like in the barray tutorial language.

I do have the alternate version implemented which returns exprs instead (you can compile it by setting EXPR_RETURN=1 in Makefile.inc). I have a typecast operator defined for converting cscalar_exprs to cscalars, so Eigen doesn't get unhappy. (since Eigen insists all input and output types be the same scalar)

This might be the easiest approach since this way you can also defer the computation to later further optimizing it. The way to define a cast operator is actually the ideal way to convert an expr into a cscalar. You could also define a constructor for cscalar from cscalar_epxr which should have the same effect.

Question 2: If you run ./build/sample1 using the alternate implementation, you'll find that BuildIt terminates early after hitting a LoopBackException.

After some hunting I saw array.h in BuildIt's source, and figured there must be some quirk in handling arrays inside BuildIt. (link to Eigen's internal array allocator). Could you tell me a bit about arrays of BuildIt types? I am quite puzzled about this too... (and I'd use this implementation had the early termination situation not happened)

Arrays inside dyn_var like dyn_var<int[4]> or just dyn_var<int[]> should behave just like any other dyn_var. The key idea is that these produce arrays in the generated code. There is a separate dyn_arr which actually creates an array of dynamic variables which can be indexed by first stage values. This doesn't really produce an array but a bunch of dyn_vars and indexing into it picks up the appropriate dyn_var where used. You can look at sample43 which might help understand it better.

I have a higher level suggestion here. For Einsum do you really need each cscalar to be a separate dyn_var? Can you not represent the entire matrix as a single dyn_var? Maybe as a 2-d array? dyn_var<int[][]>? Furthermore, do the static_vars is_constant and constant_val need to be tracked for each cscalar? Can they be tracked in an aggregate way for either the entire matrix or parts of it (like say a row?). Are there optimizations that benefit from tracking the constness of each value?

Is this exactly the code you are trying or is this a simplified example

This is a simplified snippet of what Eigen does (links from Eigen's codebase: Line 2, Line 3, Line 4). Seems the simplified example is missing important details! I can dig up the exact template instantiation and get back to you.

The second option is to modify the constructors/destructors for your cscalar class

Seeing your explanation, pretty sure this is the problem; I likely got sth wrong with the constructors/destructors for my class. Will try to poke at it; getting sample4 running is a good first step.

Thanks a lot for your help! (and for hand-holding me through the basics of C++ lol)

I think in general if you are just always creating new objects of type cscalar instead of modifying them in place, you should be able to use is_constant and constant_val without static. The LoopBackException followed by while(1) should also go away if you make sure you are indeed not modifying the cscalars anywhere (I would start by looking for loops where they might be updated). One easy way like I suggested before might be to delete the operator= to find out where it might have been modified.

Happy to help debug this more. This really seems like a cool use case for BuildIt and I curious to see how it works.

Finally, if you do need out of order destroyable static_vars, let me know. I can enable/add them under a flag in builder_context. There might be a modest overhead but not much compared to the overhead of tracking static_vars anyway.

Thanks for your notes on BuildIt's internals and implementation tips! I'll take a stab at implementing your suggestions, will post more if/when I get stuck again!

I am guessing this implementation would produce really large number of static vars[...] this can be fairly expensive for large matrices.

I have a higher level suggestion here. For Einsum do you really need each cscalar to be a separate dyn_var? [...] Are there optimizations that benefit from tracking the constness of each value?

Some details about our usecase (happy to share more, we're trying to codegen robot-specific kinematics and dynamics math using the robot type as static_var input):

Our matrices are small; 40x40 is likely the largest size we're considering.
- Our largest matrices scale with robot complexity, the largest robot we're considering has ~40 joints.
Many of our matrices are 6x6 or 4x4 (coordinate transformation matrices).
- A lot of our math involves going back and forth different coordinate frames, so there's many matmuls of small 6x6/4x4 matrices, etc.
At design time (the first stage?), we know each matrix entry will be [zero | populated in the second stage | some multiple of pi for rotation values].
- Knowing if an entry is some multiple of pi can help us simplify sines/cosines of those entries in advance. sin/cos ops are expensive, so it's worthwhile to optimize that in the first stage itself.
- Knowing if an entry is zero is helpful for skipping +,-,* operations, etc.
- There is an obvious link to prior work on sparse tensor algebra, but our matrices are small and ~40-60% sparse.
Our target platforms are CPUs and FPGAs for now. Given the small sizes of these matrices, GPUs are not helpful here.
We plan on implementing lowerings for the usual suspects (matmuls, cross products, matrix exponentials/logarithms, etc.) but we were interested in keeping Eigen as an escape hatch for our end-users.

There is a separate dyn_arr [...] This doesn't really produce an array but a bunch of dyn_vars and indexing into it picks up the appropriate dyn_var where used.

This is exactly what I was looking for! I was somehow expecting a dyn_var<T> array[SIZE] to magically behave like this lol (and indeed Eigen's internal storage is implemented as a T array[Size], with T = [dyn_var | cscalar])

So I assume BuildIt dyn_vars (and dyn_var-like structs such as cscalars) won't play well with Eigen's internal storage array then? (I tried to use a plain Eigen::Matrix<dyn_var<double>> and BuildIt generated an empty while-loop, which I assume was because vanilla C++ arrays and dyn_vars don't work well together)

I can try to get Eigen to talk to a dyn_arr, and see what happens...

Last question: I imagine you know about the sparse matmul example from the Shonan Challenge. Do you have a version of this implemented in BuildIt by any chance? This is quite close to what I'm trying to do; was curious if you have sth like this lying around that I can look at!

~~(please don't spend your time implementing this lol just asking if one of your many examples already covers this)~~

Edit: okay turns out implementing the sparse matmul example was really easy in BuildIt lol

Hey Radhika,

Some details about our usecase (happy to share more, we're trying to codegen robot-specific kinematics and dynamics math using the robot type as static_var input):

Okay these details are helpful to understand! So 40x40 isn't that big that it should cause a huge problem. I was worried you are planning to use arrays of sizes in the 100s or thousands or millions. Creating a separate static variable for each might slow everything down to a crawl.

This is exactly what I was looking for! I was somehow expecting a dyn_var array[SIZE] to magically behave like this lol (and indeed Eigen's internal storage is implemented as a T array[Size], with T = [dyn_var | cscalar])

dyn_var array[SIZE] is our design intention. However, the only problem is that the loop that calls the constructor for each element would have to be static_var. That is essentially what dyn_arr does. It allocates space for a dyn_var array[size] but manually loops over and calls constructors on each in a static loop.

The empty while loops you got are probably because of the same issue. Because Eigen::Matrix<dyn_var> creates an array and tries to initialize them with a regular loop. You essentially have a variable (the index) in the first stage that is being mutated without being marked static. I think the easiest fix if you want to continue using Eigen::Matrix<dyn_var> would be to write a partial specialization for Eigen::Matrix<dyn_var> that uses dyn_arr instead of array of dyn_var. That way the rest of the code would remain exactly the same.

Also, glad that you got sparse matmul working. It should be exactly the same as writing a regular matmul with matrix as static. Infact we also have a "roll" operation in BuildIt that was written to optimize stuff like this. You don't have to use it, but it might make the generated code easier. You can find it under samples/sample29.cpp. Again, not very essential but it rolls up similar statements into a loop and makes arrays to fill up values. Basic mechanical stuff if you want to convert back unrolled loops into loops.

Let me know if you would like to hop on a zoom call sometime to debug Eigen. I am really curious about how this go. I might also be able to come to Harvard if you would like to work on it in person.

Thanks

Hey @RadhikaG were you able to get this working by any chance?

Would any support from us help with it? I am really curious about how the whole Eigen project goes because it's an interesting usecase for BuildIt.

Thanks

Hi @AjayBrahmakshatriya, thank you for following up! I actually had to pause the Eigen integration work for a little bit to make progress on some other features in our compiler. I will almost certainly have to come back to this soon though, so I will be pinging here again if/when I get stuck!

BuildIt-lang / buildit

Encountered failed assertion: static_var.h:117 #76