arximboldi / immer

Postmodern immutable and persistent data structures for C++ — value semantics at scale
https://sinusoid.es/immer
Boost Software License 1.0
2.5k stars 182 forks source link

Very odd constructor call with transients #177

Closed jeaye closed 3 years ago

jeaye commented 3 years ago

Hello! I've been trying to pin down a very odd issue I've been seeing with my larger code base and I've managed to make a trivial repro for it. It seems that calling persistent on a vector_transient is trying to instantiate a ctor call to my object using some transient impl type, but I've no idea why. This doesn't seem like expected behavior to me, so hopefully you can provide some insight.

#include <immer/vector.hpp>
#include <immer/vector_transient.hpp>
#include <immer/box.hpp>

struct object;

template <typename T>
struct is_valid_type
{ static bool constexpr value{ false }; };
template <>
struct is_valid_type<object>
{ static bool constexpr value{ true }; };

struct object
{
  using vector_type = immer::vector<immer::box<object>>;

  template <typename T>
  object(T &&)
  { static_assert(is_valid_type<T>::value, "invalid type"); }
};

int main()
{
  object::vector_type::transient_type t;
  // If this next line is commented out, the code compiles.
  (void)t.persistent();
}

Here's the error I'm getting from GCC 10.2.0 on Arch.

❯ c++ -I lib/immer main.cpp 
main.cpp: In instantiation of ‘object::object(T&&) [with T = immer::detail::rbts::rbtree<immer::box<object>, immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap>, immer::refcount_policy, immer::spinlock_policy>, 5, 5>&]’:
lib/immer/immer/box.hpp:47:48:   required from ‘immer::box<T, MemoryPolicy>::holder::holder(Args&& ...) [with Args = {immer::detail::rbts::rbtree<immer::box<object, immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap, 1024>, immer::refcount_policy, immer::spinlock_policy, immer::no_transience_policy, false, true> >, immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap, 1024>, immer::refcount_policy, immer::spinlock_policy, immer::no_transience_policy, false, true>, 5, 5>&}; T = object; MemoryPolicy = immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap>, immer::refcount_policy, immer::spinlock_policy>]’
lib/immer/immer/detail/util.hpp:69:16:   required from ‘T* immer::detail::make(Args&& ...) [with Heap = immer::debug_size_heap<immer::cpp_heap>; T = immer::box<object>::holder; Args = {immer::detail::rbts::rbtree<immer::box<object, immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap, 1024>, immer::refcount_policy, immer::spinlock_policy, immer::no_transience_policy, false, true> >, immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap, 1024>, immer::refcount_policy, immer::spinlock_policy, immer::no_transience_policy, false, true>, 5, 5>&}]’
lib/immer/immer/box.hpp:78:43:   required from ‘immer::box<T, MemoryPolicy>::box(Arg&&) [with Arg = immer::detail::rbts::rbtree<immer::box<object>, immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap>, immer::refcount_policy, immer::spinlock_policy>, 5, 5>&; Enable = void; T = object; MemoryPolicy = immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap>, immer::refcount_policy, immer::spinlock_policy>]’
lib/immer/immer/vector_transient.hpp:185:37:   required from ‘immer::vector_transient<T, MemoryPolicy, B, BL>::persistent_type immer::vector_transient<T, MemoryPolicy, B, BL>::persistent() & [with T = immer::box<object>; MemoryPolicy = immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap>, immer::refcount_policy, immer::spinlock_policy>; unsigned int B = 5; unsigned int BL = 5; immer::vector_transient<T, MemoryPolicy, B, BL>::persistent_type = immer::vector<immer::box<object> >]’
main.cpp:26:22:   required from here
main.cpp:20:37: error: static assertion failed: invalid type
   20 |   { static_assert(is_valid_type<T>::value, "invalid type"); }
      |                                     ^~~~~

The immer::detail::rbts::rbtree type looks like the vector_transient::impl_t, but I'm not sure. I don't think it should be trying to hit my object ctor with that though. :confused:

Thank you for your time!

arximboldi commented 3 years ago

Hmmm, this is strange... :thinking:

And yes, rbtree is vector_transient::impl_t.

arximboldi commented 3 years ago

Got it. A fix is on the way. It was calling the initializer_list constructor of the vector. Fun times.

jeaye commented 3 years ago

Thanks so much for the quick fix!

arximboldi commented 3 years ago

No worries! I'm also curious about what this "larger codebase" where you are using this is, if you ware willing to talk about it :smile:

jeaye commented 3 years ago

😃 The larger codebase is a toy lang called jank, which has been a pet project of mine for several years. These days, it's basically a Clojure dialect which compiles to C++. I have two primary goals for the language:

  1. Easy, optimized native code generation with minimal run-time overhead (as opposed to graal)
  2. Gradual types, with Hindley-Milner-style type inference

For the second point, my goal is to be able to write dynamically typed code by default, like one would with Clojure normally. However, one can specify contracts/types, similar to clojure.spec. The compiler then uses those for static checks, inference, and improved codegen (to remove boxing and generate monomorphic function instances where possible).

The way this would work, in practice, is that we get to write our code first, thinking only about the data and the shapes. Once it's all working, then type annotations can be added to lock things down. So we get this kind of two-phase programming cycle, one with quick iterations and one with more rigor.

Where immer comes in is that I'm using it to back the run-time's data structures. In my code example above, I referenced object, which is my recursive dynamically typed object which can be a string, integer, real, vector, map, set, keyword, function, etc. object is a tagged union of sorts, but much more specialized than std::variant or boost::variant.

You can see the code for object here, though it's in rough shape. I'm in the middle of the third jank rewrite over the past several years. Each time gets better and applies all the learnings I've had in that time. jank was originally written in C++, then in Clojure (when I didn't know Clojure at all), and now rewritten in Clojure after having worked in Clojure for a handful of years.

I'm add an odd intersection of people who have a systems programming background who're also impassioned by what Clojure brings to the table. Thanks for asking. :)

arximboldi commented 3 years ago

Nice! That niche of systems programmers that dig Clojure :heart:

As you can imagine, my experience with Clojure was my motivation to write this library. In fact, I think that it's choice of default data-structures is Clojure's biggest contribution and what really allowed shaping "value-oriented design" as a paradigm (I'd argue that even code in Haskell and other lisps used to not be "value-oriented" in the way that Clojure's is... this days the landscape is more mixed).

I see that you are using the default memory policy in your code. I think that this could be an area of experimentation down the line. You could maybe try using LibGC.

I'll follow your project, it sounds quite interesting and something that definitelly could be useful for me if it becomes somewhat stable.

This summer I wanna book some time to some focused work on Immer again. Finally implement map transients and maybe some forms of diffing and ordered collections... Let's see.

jeaye commented 3 years ago

As you can imagine, my experience with Clojure was my motivation to write this library.

I did imagine so; we very much share the same overlap, but you made a great lib out of it.

I see that you are using the default memory policy in your code. I think that this could be an area of experimentation down the line. You could maybe try using LibGC.

Thanks for this suggestion. Right now, performance with my run-time is abysmal, so this may help. Profiling with perf + flamegraph shows that the majority of the time (in a simple ray tracer) is spent:

  1. Creating immer maps
  2. Getting data out of maps
  3. Hashing data for immer map lookups

Half of the problem is that my object is a stack-allocated tagged union. If I have a object const &o parameter and I need to pull a key from it, I have to copy, since I don't know its lifetime. So every (get m k) copies the value out of m, which is not cheap. I've tried lifting everything to use object_ptr, which is a shared_ptr<object>, to remove the copying problem, but that slowed things down even more, since now everything from numbers to seqs needs another heap allocation.

The problem with compiling to C++ is inheriting the static typing, since I'm currently compiling dynamically typed code to a statically typed lang. This makes it harder to, for example, have numbers not be boxed in object and to allow some things to be object and others object_ptr. Even with the template metaprogramming I'm already doing, that type juggling would be tough to swallow, not to mention what it'd do to the binary size.

This summer I wanna book some time to some focused work on Immer again. Finally implement map transients and maybe some forms of diffing and ordered collections... Let's see.

Please do! One thing that might be worth considering is lazy seqs in immer. Or maybe that'd just be useful for me and nobody else. 😁

absassi commented 2 years ago

Hi @arximboldi, this issue is still present in array and array_transient. It would be nice to have that fixed too (although it doesn't affect my code anymore, as I was actually using array where I should have been using vector).