estatico / scala-newtype

NewTypes for Scala with no runtime overhead
Apache License 2.0
540 stars 31 forks source link

WIP - name-based extractors for unapply to drop allocation of Some #47

Open lbialy opened 5 years ago

lbialy commented 5 years ago

Initial attempt at implementation of #46 - I have checked and this seems to work as it should, but sadly usage of another value class to introduce allocation-less name-based extractor introduces an usage limitation for macro annotations - it's impossible to declare a value class inside of another class and sadly, test suite classes are one of potential use cases like that. Frankly, I have no idea how and if this can be circumvented but maybe some other soul can find a way to solve this riddle.

Tests fail, obviously, as macro annotations now won't compile with unapply = true when declared inside of a class.

lbialy commented 5 years ago

Solution using a shared instance of name-based extractor fails for 2.10 only. Will look into this.

carymrobbins commented 5 years ago

First, thanks for the PR and I appreciate your patience waiting on me to finally get around to commenting on it! :smiley:

While I like the spirit of this change, I wonder if the Some ever really gets constructed here after the JVM is warmed up. I played around with this idea a while back (see https://github.com/estatico/scala-newtype/issues/18#issuecomment-383207925).

Here are the benchmarks via sbt clean 'jmh:run TestBenchmark' -

Benchmark                     Mode  Cnt          Score         Error  Unit
testCaseClass                thrpt  200  402211203.871 ±  861246.654  ops/s
testManualSimpleUnapply      thrpt  200  393148121.470 ± 4173139.812  ops/s
testManualUnapplyValueClass  thrpt  200  384478663.022 ±  870967.771  ops/s
testNewTypeSimpleUnapply     thrpt  200  110200664.917 ±  283007.861  ops/s

the testManual benchmarks are for handwritten newtypes, whereas the testNewType one is the macro generated one which contains ClassTag instances.

To expand on this, testManualSimpleUnapply uses the Some extractor whereas testManualUnapplyValueClass uses a value class (the same strategy as this PR) -

https://github.com/oleg-py/newtype-unapply/compare/master...carymrobbins:unapply-class

object Manual {
  type Type <: Base with Tag
  trait Tag
  type Base <: Any
  def apply(x: String): Manual = x.asInstanceOf[Manual]
  def unapply(x: Manual): Some[String] = Some(x.asInstanceOf[String])
}

object ManualUnapplyValueClass {
  type Type <: Base with Tag
  trait Tag
  type Base <: Any
  def apply(x: String): ManualUnapplyValueClass = x.asInstanceOf[ManualUnapplyValueClass]
  def unapply(x: ManualUnapplyValueClass): Unapply = new Unapply(x.asInstanceOf[String])
  final class Unapply(val get: String) extends AnyVal {
    def isEmpty: Boolean = false
  }
}

Ideally, these handwritten versions should perform better than any generic solution (unless I've horribly messed something up) and their performance is very close (with the Some version actually being slightly faster, but that could just be due to solar flares).

If you do indeed want to pursue this avenue (although I know it's been a couple months now since this PR's inception), I'd recommend forking my benchmark and seeing how well it does. But if the performance of using Some is essentially identical to using AnyVal (which is probably mostly due to some JIT magic) it might make sense to leave it as Some. Definitely open to ideas here though.

lbialy commented 5 years ago

To be perfectly honest I wouldn't expect allocation-less unapply to be significantly faster than Some-based version. One reason I can think of off the top of my head is that TLAB allocations are very fast, so any benchmark that would measure real impact of this would have to actually measure the impact of garbage generation pressure on the collector. With semi-concurrent and fully concurrent GCs available on the JVM this is a hard task. One way to measure this would probably be to use serial collector and prolonged benchmark which should show both the cost of allocation and elongated GC pause. There's another optimization that is available to some extent with C2 (at least that's what I believe, I'm not sure) which is escape analysis and greatly improved with GraalVM. If I understand this correctly JVM can choose to allocate Some in stack frame and then deallocate everything by just dropping that frame.

Now the question that you might be thinking of is: why would it be beneficial to have this in scala-newtype if JVM is capable of such feats? My answer is that if this project strives to provide sensible, zero-cost abstractions it should do so consistently and if it's possible to avoid an allocation using a clever compiler trick it would be the right thing to do.

I'm also a tad bit pedantic and like working on such intricacies :smile: