Exponential compiletime blowup, with nested classes

amnaredo commented 3 years ago

The code below kills the scala-compiler. It will lead to an exponential number of implicit-lookups. This can be quite annoying when defining complex protocols. It also leads to the same amount of classfiles.

Workaround: For noncyclic datatypes, it's possible to define the "later Writers/Readers" first, to break down the exponential growth.

/* workaround:
implicit val ww15 = implicitly[upickle.Writer[A15]]
implicit val ww10 = implicitly[upickle.Writer[A10]]
implicit val ww5 = implicitly[upickle.Writer[A5]]
*/
implicit val ww1 = implicitly[upickle.Writer[A1]]
  // kill compiler
  case class A1 (x: A2 , y: A2 )
  case class A2 (x: A3 , y: A3 )
  case class A3 (x: A4 , y: A4 )
  case class A4 (x: A5 , y: A5 )
  case class A5 (x: A6 , y: A6 )
  case class A6 (x: A7 , y: A7 )
  case class A7 (x: A8 , y: A8 )
  case class A8 (x: A9 , y: A9 )
  case class A9 (x: A10, y: A10)
  case class A10(x: A11, y: A11)
  case class A11(x: A12, y: A12)
  case class A12(x: A13, y: A13)
  case class A13(x: A14, y: A14)
  case class A14(x: A15, y: A15)
  case class A15(x: A16, y: A16)
  case class A16(x: A17, y: A17)
  case class A17(x: A18, y: A18)
  case class A18()

ID: 61 Original Author: FlorianKirmaier

amnaredo commented 3 years ago

This is not surprising at all. The test suite already takes quite a while to compile just because I have a pretty large/deep test case inside of it, so I suppose blowing up is expected =P

Cool that you got a test case for it though. Perhaps we'll be able to find some solution

Original Author: lihaoyi

amnaredo commented 3 years ago

So Fixing your particular use case was pretty easy. It's trickier to fix cases like

  case class A1 (x: Option[A2] , y: A2 )

Since that requires that we manually flatten out the implicits from all the nested calls, which is pretty complex logic.

It's even trickier to fix cases like

  type A1 = Tuple2[A2, A2]

Since that means we can't use any "normal" implicits anymore: they all have to be macros, in order for the flattening to work!

I doubt I'll be able to fix this by the next release (with scalajs 0.6.0). Maybe @xeno-by would have some tricks we could try? I remember I sent an email to scala-internals once but didn't find any solution at the time.

Original Author: lihaoyi

amnaredo commented 3 years ago

I think it is possible to avoid exponential blowup with clever sorting of all involved types.

Each type from a closed set of types (case classes without non-sealed polymorphic fields and possibly extending sealed traits, tuples, collections, primitives - well, everything defined in Implicits and definable with macros) eventually corresponds to a graph of types:

diag

If we sort these graphs topologically in the reverse order of dependencies we will be able to emit generated macros in the correct order so they are not instantiated multiple times.

There could be a problem with cycles, but such cases are rare, and I think it is possible to come up with some simple algorithm for breaking these cycles (finding the minimal feedback arc set is NP-hard, but we don't need it; I think that just some feedback arc set will suffice).

This won't work for tuples, however, because then tuple readers and writers have to be macro-generated as well. The top-level type has to be a case class. But in that case I think it is possible to write an auxiliary macro which will generate implicit vals only:

Macros.generateOrderedWriters[(A1, A2, A3, ..., A18)]
val ww1 = implicitly[upickle.Writer[(A1, A2, A3, ..., A18)]]

expands to something like

implicit val w1 = Writer.macroW[A18]
implicit val w2 = Writer.macroW[A17]
implicit val w3 = Writer.macroW[A16]
...
implicit val w18 = Writer.macroW[A1]
val ww1 = implicitly[Writer[(A1, A2, A3, ..., A18)]]

Or we can just make tuple readers/writers macro-generated as well, I doubt it will be hard.

Original Author: netvl

amnaredo commented 3 years ago

@netvl Your solution is fine, but maybe it could be even easier. Correct me if I'm wrong.

We don't have to Sort the Types before declaring them. When we use implicit def or implicit lazy val instead of implicit val, the ordering of the declarations won't matter. The only thing we have to do, is to find out all relevant types, and then declare them all in one namespace. (this is currently not the case, which is the main problem) And by the way, we get the "perfect behaviour" for all cyclic types. The only difficulty part is to get all relevant types, but I think it is quite easy.

Original Author: FlorianKirmaier

amnaredo commented 3 years ago

@FlorianKirmaier @netvl yeah that's gonna be pretty hard to implement. At the very least, it'll be much harder than the naive use-implicit-search-everywhere technique. You'll need to manually de-structure (deeply!) and order all the implicits correctly. Probably doable, but I'm not sure I have the skills to pull it off!

I think even implicit defs and lazy vals need to be sorted in the correct order for them to apply, but not sure.

And then we have to make sure that this works for recursive types! I'm doing this now with the knot functions, which works for the recursive-implicits way of doing things, but not sure how that would translate to the manually-flatten-everything-out solutions

Original Author: lihaoyi

amnaredo commented 3 years ago

@lihaoyi, why do you think that collecting all types in the hierarchy is difficult? I'm pretty sure it won't be hard with macros. Ordering might be more difficult, but this is an already solved problem - it is just a toposort. The only difficulties I see are indeed with recursive types, but I guess since they do work now, it will be possible to make them work under the new scheme too.

Original Author: netvl

amnaredo commented 3 years ago

I wouldn't be too optimistic. We're currently facing SI-7046 which makes it a lot more difficult to iterate over class hierarchies.

Original Author: tindzk

amnaredo commented 3 years ago

@tindzk This doesn't even need class hierarchies though; even composition is enough to cause this problem.

The main thing getting in my way of solving this is that the uPickle macros are hairy enough as is, and doing this would make it even hairier D=

Original Author: lihaoyi

amnaredo commented 3 years ago

@netvl You mentioned cycles. I also don't know how your proposed technique would apply to cycles =/ Unlike you, I think that cycles are ridiculously common. Any recursive type such a List[T], Tree[T], Vector[T] has a cycle. Granted List and Vector have hardcoded picklers being collections, but similar user-defined types will definitely bump into this problem...

Original Author: lihaoyi

amnaredo commented 3 years ago

@FlorianKirmaier I don't really understand. Are you proposing a transform like this?

case class A1(x: Option[A2] , y: A2)
case class A2(x: Option[A3] , y: A3)
case class A3()

implicitly[Reader[A1]] == {
  lazy val a = Case2RW(A1.apply, Seq("x", "y"), Seq(null, null))(implicitly[Option[A2]], implicitly[A2])
  lazy val b = Case2RW(A2.apply, Seq("x", "y"), Seq(null, null))(implicitly[Option[A3]], implicitly[A3])
  lazy val c = Case0RW(A3.apply, Nil, Nil)
  a
}

This seems like it might work in this case. But what about cycles? Maybe adding a knot around each one is enough?

implicitly[Reader[A1]] == {
  lazy val a = R.Knot{implicit a => Case2RW(A1.apply, Seq("x", "y"), Seq(null, null))(implicitly[Option[A2]], implicitly[A2]) }
  lazy val b = R.Knot{implicit b => Case2RW(A2.apply, Seq("x", "y"), Seq(null, null))(implicitly[Option[A3]], implicitly[A3]) }
  lazy val c = R.Knot{implicit c => Case0RW(A3.apply, Nil, Nil) }
  a
}

Original Author: lihaoyi

amnaredo commented 3 years ago

One question: how would you know when to stop? e.g. if someone defined his own implicit i: Reader[A2], we want to bail out early and generate

implicitly[Reader[A1]] == {
  lazy val a = R.Knot{implicit a => Case2RW(A1.apply, Seq("x", "y"), Seq(null, null))(implicitly[Option[A2]], implicitly[A2]) }
  a
}

and not generate readers for those already-defined implicit readers. That is necessary because those types may not be macro serializable.

Original Author: lihaoyi

amnaredo commented 3 years ago

Yes, the code should look like something you posted. But you have to move the implicit from the function argument to the lazy val. Otherwise we wouldn't use any lazy val except a.

Whether Cycles work by default depends on how the Readers are implemented. When the depending implicits are used at creation time, then we will get an error on Cycles. But when the depending Readers are only touched when we request pickling, then it should work. I'm not sure what a Knot is, but i suppose it provides exactly this property.

This should also improve the performance for macro generated picklers, because each Reader is generated only once!

In my head the code looks something like this:

implicitly[Reader[A1]] == {
// A list of all types for which we dont have Readers at the current Scope
  implicit lazy val a = implicitly[Reader[Option[A2]]]
  implicit lazy val b = implicitly[Reader[A2]]]
  implicit lazy val c = implicitly[Reader[Option[A3]]]]
  implicit lazy val d = implicitly[Reader[A3]]

  <macro generated code for A1>
}

It's worth to mention, that most of the projects using macros have the same problem, so it would be really nice to have a pattern or a general solution ...

But it's probably not easy to implement it ...

Original Author: FlorianKirmaier

amnaredo commented 3 years ago

@FlorianKirmaier @netvl I've gotten pretty far with the proposed approach in this branch

https://github.com/lihaoyi/upickle/compare/cached-macros?expand=1

I've disabled all the non-macro tests for now, but I have almost all the macro tests passing. The algorithm is basically

Start from the root type
Recurse into any members if it's a class, or into any subclasses if it's a sealed trait
Gather up all the types which don't already have a non-macro implicit in scope for them
Create the lazy vals, mostly as described, except that their LHS isn't implicitly but instead Knot(Case2NR(...)) to provide a lazy, recursion-friendly wrapper around the actual implementation

I'm stuck, though, on fixing the remaining tests, all of which involve generics. The problem is that given a type

Option[Int]
List[Int]
List[IntTree]
A[B]

The problems are:

There is no way for me to know, in the general case, that materializing a pickler for A[B] will requite a pickler for B. Sure it does for some generics (esp. containers), but certainly not all of them, and I cannot figure out how to distinguish them.
Even if I did know how to distinguish them, I cannot ask whether there is an implicit in scope for A[B] without also materializing the implicit for B, and then we're back to exponential materialization again by simply wrapping the original failure case with Options

This flaw also exists in @netvl's proposal. I don't know what the way forward is, but if we can figure out a better way of doing this it shouldn't be hard to implement, re-enable the last few tests and publish

Original Author: lihaoyi

amnaredo commented 3 years ago

I hate to mention this since its so simplistic, but doesn't play-json get around this whole problem by basically requiring you to declare the format class (aka the knot) somewhere explicitly, then either importing it or passing it in directly? I've always thought that explicitly defining (more boilerplate) or a macro annotation is better for this kind of stuff, after working with macros enough.

Original Author: dispalt

amnaredo commented 3 years ago

Mostly fixed in master. I bet there are cases where you can still get exponential behavior, but it's not gonna be so easy anymore!

Original Author: lihaoyi

amnaredo commented 3 years ago

New release with the fix soon?

Original Author: ngbinh

amnaredo commented 3 years ago

Not before I fix https://github.com/lihaoyi/upickle/issues/55 and maybe https://github.com/lihaoyi/upickle/issues/40

Original Author: lihaoyi

amnaredo / test

Exponential compiletime blowup, with nested classes #107