Extended IR - Githubissues

aalexandrov commented 8 years ago

This is a meta-issue that should track the planned changes in the IR.

Throughout the discussion in this issue, we will use the following schema based on the MostConsistentClassifier workflow:

// product types
case class Profile(id: Long, name: String, surname: String)
case class Email(id: Long, ip: Int, content: String)
case class Server(ip: IP, isBlacklisted: Boolean)
case class Classifier(model: Long, weights: SparseVector[Double])

Motivation

Most of the transformation-based optimizations in Emma can only be applied if certain conditions apply. Deciding whether the conditions are satisfied typically involves some form of dataflow analysis on top of lifted code.

Example: Fold-Group Fusion

Consider the following code that produces two aggregates over the emails grouped by their ip value:

for {
  g <- emails.groupBy(_.ip)
} yield {
  val agg1 = g.values.sum()
  val agg2 = g.values.map(_.content.size()).sum()
  (g.key, agg1, agg2)
}

The code is eligible to fold-group fusion (FGF) as all g.values uses appear within the context of chain of (zero or more) map applications ending in a fold.

While (FGF) of the variation above works at the moment, the following variations will not be considered eligible, as at the moment there is no direct and simple way to consider constant propagation in the Scala expressions referenced in comprehended terms:

// structural typing on the argument
for {
  (key, values) <- emails.groupBy(_.ip)
} yield {
  val agg1 = values.sum()
  val agg2 = values.map(_.content.size()).sum()
  (key, agg1, agg2)
}

// projection aliases
for {
  g <- emails.groupBy(_.ip)
} yield {
  val values = g.values
  val agg1 = values.sum()
  val agg2 = values.map(_.content.size()).sum()
  (g.key, agg1, agg2)
}

Example: Join Ordering

Another scenario where a better IR might be beneficial is join ordering. Consider the following code

for {
  e <- emails
  s <- people
  r <- people
  if from(e.content) == s.email
  if to(e.content) == s.email
} yield {
  val x1 = e1 // depends on e and s
  val x2 = e2 // depends on x1 and r
  f(x2)
}

In order to reduce the amount of communication cost when this expression is compiled to a join cascade, we might want to apply x1 right after we join e and s, prune all extra fields, and only after that join with r.

In effect, this means that the head expressions is split into multiple parts and parts of it might be applied on top of a rhs of aggregator during the combinator rewrite phase.

Approach

In order to make the compiler oblivious to such code modifications, I propose to extend the current comprhension IR in the following strategy.

Target a limited subset of Scala expressions based on the usage patterns observed in emma-examples. Lift, normalize, and transform the whole quoted three rather than only the comprehended expressions.
Use a flavour of an SSA-like IR which simplifies answers based on use-def and def-use chains. A good candidate seems to be direct style in let-normal form for a good overview and a short introduction to this IR, please refer to Chapter 6 in the SSA Book.
Restructure the code in a way which will allow us to use it both for macro-based and reflection-based compilation.
Refactoring Plan

The refactoring of the code should be based on the following refactoring plan. Each item will be tracked by a separate issue:

[x] Immutable comphrehensions
[x] Consistent comprehension naming and nomenclature
[x] Simplify macro-level IR
[x] Design a universe-agnostic, holistic IR
[x] Simplify runtime-level IR in emma.ir
[x] Scala AST to Let Normal Form translation

joroKr21 commented 8 years ago

Actually we could use a somewhat hacky solution to bring the separate hierarchies together. Since Scala's Constant(Literal(_)) is polymorphic we could wrap the comprehension nodes in constants and then use a modified form of traversal that works on union types (or possibly do 2 passes - one for Scala nodes and one for Emma nodes).

joroKr21 commented 7 years ago

I think we can close this now

aalexandrov commented 7 years ago

Alright. I've linked the issue on the wiki for future reference.

emmalanguage / emma

Extended IR #147

Motivation

Example: Fold-Group Fusion

Example: Join Ordering

Approach

Refactoring Plan