NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

Rearrange implicits to speedup implicit search #324

Closed retronym closed 10 years ago

retronym commented 10 years ago

Improves the compile times of client code using implicitly[WireFormat[...]] and implicitly[AvroSchema[...].

Exploits and optimization in implicit search: if an eligible result has already been found, the compiler does not need to evaluate if another candidate also satisfies the search if it is less specific than the first result. "less specific" is the parlance of static overload resolution. All else being equal based on the type signatures, a method defined in a subclass of another is more specific.

The general pattern for organization can be seen here independent form Scoobi:

import collection.generic.CanBuildFrom

object Slow {
  trait OtherTC[_]

  trait TC[A]

  object TC {
    implicit def genericTc[T: OtherTC]: TC[T] = ???
    implicit def traversableTC[CC[X] <: Traversable[X], T](implicit cb: CanBuildFrom[_, T, CC[T]]): TC[CC[T]] = ???
    implicit def mapTC[M[K, V] <: Map[K, V], K, V]: TC[M[K, V]] = ???
    implicit def tcOption[T: TC]: TC[Option[T]] = ???
    implicit def tcDouble: TC[Double] = ???
  }
}

object Fast {
  trait OtherTC[_]

  trait TC[A]

  trait LowPriorityTC {
    implicit def genericTc[T: OtherTC]: TC[T] = ???
    implicit def traversableTC[CC[X] <: Traversable[X], T](implicit cb: CanBuildFrom[_, T, CC[T]]): TC[CC[T]] = ???
    implicit def mapTC[M[K, V] <: Map[K, V], K, V]: TC[M[K, V]] = ???
  }
  object TC extends LowPriorityTC {
    implicit def listTC[T]: TC[List[T]] = traversableTC[List, T]
    implicit def immutableMapTC[K, V]: TC[Map[K, V]] = mapTC[Map, K, V]
    implicit def tcOption[T: TC]: TC[Option[T]] = ???
    implicit def tcDouble: TC[Double] = ???
  }
}

object Test {
  implicitly[Slow.TC[Option[Option[Option[Option[List[Double]]]]]]]
  implicitly[Fast.TC[Option[Option[Option[Option[List[Double]]]]]]]
}

/*
scalac-hash v2.10.3 -Xlog-implicits test.scala 2>&1 | grep "not a valid"
test.scala:37: Slow.this.TC.mapTC is not a valid implicit value for Slow.TC[List[Double]] because:
test.scala:37: Slow.this.TC.mapTC is not a valid implicit value for Slow.TC[Option[List[Double]]] because:
test.scala:37: Slow.this.TC.traversableTC is not a valid implicit value for Slow.TC[Option[List[Double]]] because:
test.scala:37: Slow.this.TC.mapTC is not a valid implicit value for Slow.TC[Option[Option[List[Double]]]] because:
test.scala:37: Slow.this.TC.traversableTC is not a valid implicit value for Slow.TC[Option[Option[List[Double]]]] because:
test.scala:37: Slow.this.TC.mapTC is not a valid implicit value for Slow.TC[Option[Option[Option[List[Double]]]]] because:
test.scala:37: Slow.this.TC.traversableTC is not a valid implicit value for Slow.TC[Option[Option[Option[List[Double]]]]] because:
test.scala:37: Slow.this.TC.mapTC is not a valid implicit value for Slow.TC[Option[Option[Option[Option[List[Double]]]]]] because:
test.scala:37: Slow.this.TC.traversableTC is not a valid implicit
*/
retronym commented 10 years ago

Review by @etorreborre

retronym commented 10 years ago

Discussion: https://groups.google.com/d/msg/scala-ide-user/3iKmWlD2yhg/XfiTJcXEUNAJ

retronym commented 10 years ago

Hmm, I didn't compile the tests after this change.

[error] /home/travis/build/NICTA/scoobi/src/test/scala/com/nicta/scoobi/acceptance/AvroFileSpec.scala:75: could not find implicit value for parameter as: com.nicta.scoobi.Scoobi.AvroSchema[(Int, Seq[(Float, String)], Map[String,Int], com.nicta.scoobi.acceptance.ThousandBytes)]
[error]     val tmpAvroFile = createTempAvroFile(testData.toDList)
[error]                                         ^

Still needs a little refinement!

retronym commented 10 years ago

Pushed a fix for the compile error, and added more "fast path" implicits.

Here's the before/after for the number of failed "expensive" implicits:

  /code/gist/3282e8b49b7748cc2689 scalac-hash v2.10.3 -Xlog-implicits  @args.txt -Xprint:typer LME.scala 2>&1 | grep "not a valid" | wc -l
    2200
scalac-hash v2.10.3 -Xlog-implicits  @args2.txt -Xprint:typer LME.scala 2>&1 | grep "not a valid" | wc -l
     0

The risk of these commits is that we've introduced an implicit ambiguity along the way. So perhaps its wise to add some test cases for resolving these implicits (e.g. check that implicitly[List[String]) still works.

retronym commented 10 years ago

Overall speedup in my benchmark after all that was 5.5x.

cvogt commented 10 years ago

Did you only merge 09aafe1, not the other commits of this PR or is this a misleading github message?

etorreborre commented 10 years ago

No the other commits have been merged as well

* 09aafe1 - No more fruitless paths in implicit serach for LME benchmark (8 hours ago) <Jason Zaugg>
* 4f6a634 - One more specific implicit for faster compilation (17 hours ago) <Jason Zaugg>
* a671b29 - Further reorganize typeclass implicits for faster implicit search (17 hours ago) <Jason Zaugg>
* a69576b - Rearrange implicits to speedup implicit search (18 hours ago) <Jason Zaugg>

However there seem to be some issues when running Avro jobs on the cluster. I'm investigating this.