Scala DSL - Githubissues

breandan / kotlingrad

🧩 Shape-Safe Symbolic Differentiation with Algebraic Data Types

https://breandan.net/public/masters_thesis.pdf#page=49

Apache License 2.0

515 stars 21 forks source link

Scala DSL #11

Open tribbloid opened 4 years ago

tribbloid commented 4 years ago

Purpose

Introducing a secondary tensor operation DSL (Domain Specific Language) written in & optimised for Scala language & various compilers (the most common of which are JVM based scalac 2.12+, other options are Scala.js & Dotty)

Despite not being a dependently typed language, Scala may be favourable for some reasons:

Scala is the preferred language of Apache Spark, Apache Flink, CHISEL (Constructing Hardware In Scala Embedded Language) and NVidia RAPIDS, all of which are important infrastructures for large scale grid learning.
Singleton types (shapeless / 2.13) allows arbitrary number of shape types to be generated on-demand, without relying on code generator or metaprogramming
Path-dependent types allows shape algebra to be defined for both symbols & numbers
Type inference by summoning implicits is more flexible for theorem proving than inheritance & delegation
Operator overloading & infix syntax allows DSL to be closer to math notations, e.g. vec1 dot_* vec2 can be easier to read than dotProduct(vec1, vec2)
Advanced type algebra can be expressed in macro & roadmap features. Specifically, the singleton-ops library and it’s associated discussions in SIP-23 have suggested that Scala 2.13 & Dotty will use a more direct & expressive approach

Usage

A PoC project has been submitted at:

https://github.com/tribbloid/shapesafe/tree/master

At this moment (06/07/2020) it only implemented very few operations (dot_product, concat) for double vectors. However most features in the following list has been proved and can be showcased by running gradle test on DoubleVectorSpec.scala.

Required & proven features:

Type-level shape representation for vector
Compile-time shape safety for operations with constant shapes (1)
Run-time shape safety for operations with variable shapes (2)
(1) can smoothly degrade to (2) for more flexible programming paradigm
Not relying on compile-time code generation or defining large number of compile-time classes, both of which may stress a JVM classloader
Not relying on esoteric compiler plugin, macro or unstable libraries. (the PoC only uses singleton-ops which is actively maintained by lightbend core team)

Required & unproven:

Type-level shape representation for matrix
Type-level shape representation for arbitrary tensor, with no limitation on its arity and dimensionality (ideally through using recursive generic types, the same technique that defined shapeless.HList)
Support for UInt & Float element data types
Support for more operations, particularly if used frequently by ANN

Nice to have:

Not relying on church type encoding (The only violation in my PoC is shapeles.Nat), which causes slow compilation
Compile-time & run-time shape safety for named tensor
Shape algebra defined for symbols instead of number supported by user-defined axioms & compile-time theorem proving
Dotty or Scala.js support

Not in the scope:

AutoDiff / Differentiable Programming: should be agnostic to DSL
Interoperability with Apache Spark, NVidia RAPIDS or any other library, models should manifest into different executions regardless of DSL being used
Shape safe compositor / pipeline API, too much work

How it fits into roadmap

The competition for supremacy of deep learning ecosystem is brutal and unforgiving. With torch & tensorflow dominating both research & development phase, people have little reasons to steer away from Python & a dynamically typed, procedurally validated scaffolding paradigm. But there are exceptions: the large scale, mission critical, complex systems in production, like autopilot and SLAM, most likely prefers spending much effort reinventing & maintaining a more verbose and arduous code base written in C++ or other low level languages. For these systems, demands for built-in correctness and predictability of execution far outweights the ability to write more concise code.

This is, IMHO, the market niche where kotlingrad can fit in: for mega-engineering rather than prototyping. In particular, to enable users to:

write provably valid neural architecture WITHOUT sanity test
if not possible, write neural architecture with lower test coverage that validates the part that cannot be proven, the 80-20 rule in test coverage is very real and account for most edge case failures in an architecture that lacks any type.
under the above premise, write short & easy to understand code

. in that order. My design & optimisation of DSLs should be consistent with this doctrine. The chosen of Scala & JVM as carrier should naturally put kotlingrad in the ecosystem of Apache Spark, and maybe of RISC-V on the long run, both of which are for large scale production.

tribbloid commented 4 years ago

Hi Breandan, haven't heard from you & Thorsten for a while, what are you guys scheming on lately?

breandan commented 4 years ago

Hey @tribbloid, looks cool! I won't have time to review this for a couple weeks, but encourage you to keep going! Who do you see as the audience for the Scala DSL? Is this just for Scala users, or would it be possible to consume the Scala DSL from Kotlin/Java, and how would that look? It would be helpful to have some usage examples with different algorithms for reference, e.g. matrix multiplication, linear regression, MLP, RNNs. I know you're a Scala fan, so I won't try to convert you, but have you looked into arrow-kt? I know it has a lot of Scala-like functionality.

BTW I recently looked into using TVM4J, but it looks like they're sorting out some stuff, so it might be good to circle back when they decide to publish a JVM library. Seems like TF4J is also making some progress, so it will probably be between the two. Haven't seen @tscholak in a while, but I saw he's doing a GSoC with hasktorch. It would be great to catch up at some point after all the dust settles down. How are things on your end? Hopefully things return to normal again soon!

tribbloid commented 4 years ago

For arrow-kit

Scala DSL should be for scala users, kotlin follows a different design philosophy.

But I'm tempted, so I wrote some random showcase in both scala & kotlin:

in scala:


val x3 = DoubleVector(1.0, 2.0, 3.0)
val x50 = DoubleVector(
  1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
  1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
  1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
  1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
  1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
)

val y3 = DoubleVector.random(3)
val y6 = DoubleVector.random(6)
val y50 = DoubleVector.random(50)
val y53 = DoubleVector.random(53)

println(x3 dot_* y3)

// x3 dot_* y4 doesn't compile

println((y3 concat y50) dot_* y53)

println((x50 concat y3) dot_* y53)

// (x50 concat y3) dot_* y6 doesn't compile


- In kotlin:
```kotlin
    val x3 = DoubleVector(1.0, 2.0, 3.0)
// throws 'Too many arguments for public constructor'
    val y3 = DoubleVector.random(3)
// throws 'Does not conform to expected type'

Oops, doesn't work. I guess scala use too much syntax sugar for compiling.

I didn't have a lot of experience with kotlin, but I can see that arrow is very ambitious, almost equivalent to rewriting the compiler. Not doubt it will make many things easier (e.g. getting rid of that code generator). However, I still didn't see the ability to summon axioms & proofs at type level, which is important once tensor operation start to get hairy (e.g. the concat in the above showcase, the compiler automatically summoned the proof that 50+3 == 53). I'll be surprised if they add this feature later, as kotlin's design philosophy deliberately stated that compilation should be fast.

tribbloid commented 4 years ago

For TVM4J

We don't need to wait for TVM4J release/publishing, it is at the opposite end of our layer of abstraction. We could only think about AutoDiff, shape safety & forward/backward rule generation, while let the boys at DMLC to worry about model compilation and deployment, their tools will definitely keep up with latest TVM release.

Then I just realised that TVM just moved out from DMLC lately, so I may need to study their politics before making a prediction :-<

breandan commented 4 years ago

I still didn't see the ability to summon axioms & proofs at type level,

I believe the Arrow folks are adding support for type proofs, but it's still early in development. I had a chat with @pakoito about type level integers a while ago, not sure if this is a prototype for arrow-meta proofs or just a design discussion. It would be interesting to see if were possible to push this direction a bit further, maybe it is possible to do some form of type-level arithmetic for concatenation and slicing. Maybe you could even implement shape-safe convolution operators. Tempted to try.

Scala DSL should be for scala users, kotlin follows a different design philosophy.

I see, so you want to consume Kotlin∇ from Scala. I guess it makes sense. I know there is an existing shape-safe tensor library for Scala called nexus, it might be interesting to compare what @ctongfei was doing with your planned DSL to see if there is anything that could be borrowed or improved. Right now, our AD implementation is pretty coupled with the shape-checking stuff, but it should be possible to provide a shapeless AD implementation, as you mention it would probably be a lot easier to implement the Kotlin∇ shape system in Scala anyway.

tribbloid commented 4 years ago

On nexus

No they don't, I've used it for a long time and it's in his TODO list forever:

[TODO] Typesafe tensor sizes using literal singleton types (Scala 2.13+)

I probably should remind him that 2.13 is not required

breandan commented 4 years ago

Interesting, I was not aware of this! Do you think singleton types are capable of supporting convolutional arithmetic? I have seen implementations of addition and subtraction using Presburger arithmetic.

If you were able to persuade Scala/Shapeless into performing multiplication and division at the type level, that would be pretty impressive/paperworthy. Maybe you could start out with something simple like unit strides which just requires doubling, addition and subtraction:

Then maybe work your way up to dialated convolution which requires multiplication, division and floor:

tribbloid commented 3 years ago

Yep, easy. See my example at: https://github.com/tribbloid/shapesafe/blob/49a1c5b29d6d1ec619b2decd38c4b716cdeef713/demo/src/main/scala/edu/umontreal/kotlingrad/shapesafe/demo/DoubleVectorDemo.scala#L35-L35

{
  val conved = x50.conv(x3)
  println(conved.data.size)

  conved.arity.internal.requireEqual(48)
  //    conved.arity.internal.requireEqual(49) // doesn't compile
}

{
  val conved = x50.pad(5).conv(x3, 2)
  println(conved.data.size)

  conved.arity.internal.requireEqual(29)
  //    conved.arity.internal.requireEqual(28) // doesn't compile
}

OK it’s not easy at all, I spent too much time trying to get implicit view pattern & singleton summoner pattern to work, until the legendary @DmytroMitin pointed out that type class is still the best option.

The experiment code had some drastic changes to allow short definition of complex arithmetic, the one related to convolution is right here:

https://github.com/tribbloid/shapesafe/blob/49a1c5b29d6d1ec619b2decd38c4b716cdeef713/core/src/main/scala/edu/umontreal/kotlingrad/shapesafe/core/tensor/DoubleVector.scala#L99-L99

tribbloid commented 3 years ago

just found his project:

https://summerofcode.withgoogle.com/projects/#5862505638789120

Nice work!

breandan commented 3 years ago

Very nice! I've been playing around with this a bit and just had a few comments/questions:

Vectors do not (yet?) compose. Vector(Vector(0.0), Vector(0.0)) produces:

[Error] ... could not find implicit value for parameter proofOfType: edu.umontreal.kotlingrad.shapesafe.m.util.Constraint.ElementOfType[edu.umontreal.kotlingrad.shapesafe.core.tensor.DoubleVector[edu.umontreal.kotlingrad.shapesafe.m.arity.Arity.FromOp[singleton.ops.impl.OpMacro[singleton.ops.impl.OpId.ToInt,shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless._0]]],Int(0),Int(0)]]] :: edu.umontreal.kotlingrad.shapesafe.core.tensor.DoubleVector[edu.umontreal.kotlingrad.shapesafe.m.arity.Arity.FromOp[singleton.ops.impl.OpMacro[singleton.ops.impl.OpId.ToInt,shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless._0]]],Int(0),Int(0)]]] :: shapeless.HNil,Double]

Singleton types are unreadable. May be it's possible to have literal singletons as you mentioned above?

val x3: DoubleVector[Arity.FromOp[NatAsOp[Succ[Succ[Succ[_0]]]]]] = DoubleVector(1.0, 2.0, 3.0)
val x3: DoubleVector[3] = DoubleVector(1.0, 2.0, 3.0)

How do you define a function with a fixed arity?

def vec6id(vec: DoubleVector[Arity.FromOp[NatAsOp[Succ[Succ[Succ[Succ[Succ[Succ[_0]]]]]]]]]) = vec
val t = vec6id(x3 concat y3)

[Error] ... polymorphic expression cannot be instantiated to expected type;
 found   : [P <: edu.umontreal.kotlingrad.shapesafe.m.arity.Proof]edu.umontreal.kotlingrad.shapesafe.core.tensor.DoubleVector[P#Out]
 required: edu.umontreal.kotlingrad.shapesafe.core.tensor.DoubleVector[edu.umontreal.kotlingrad.shapesafe.m.arity.Arity.FromOp[edu.umontreal.kotlingrad.shapesafe.m.arity.Utils.NatAsOp[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Nat._0]]]]]]]]]
    (which expands to)  edu.umontreal.kotlingrad.shapesafe.core.tensor.DoubleVector[edu.umontreal.kotlingrad.shapesafe.m.arity.Arity.FromOp[singleton.ops.impl.OpMacro[singleton.ops.impl.OpId.ToInt,shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless._0]]]]]],Int(0),Int(0)]]]

How do you write a function where one argument is fixed and the other is generic?

def vec3id(x3: DoubleVector[Arity.FromOp[NatAsOp[Succ[Succ[Succ[_0]]]]]], y3: DoubleVector[Arity.FromLiteral[Lt[Int]#T]]) = x3
val t = vec3id(x3, x3) dot_* x3

[Error] ... type mismatch;
 found   : edu.umontreal.kotlingrad.shapesafe.core.tensor.DoubleVector[edu.umontreal.kotlingrad.shapesafe.m.arity.Arity.FromLiteral[Int(3)]]
 required: edu.umontreal.kotlingrad.shapesafe.core.tensor.DoubleVector[edu.umontreal.kotlingrad.shapesafe.m.arity.Arity.FromLiteral[shapeless.Witness{type T <: Int}#T]]

Minor issue, but the Scala plugin for IntelliJ IDEA does not type check compound expressions and must wait for compilation. This is probably the correct behavior for type proofs, but it might be possible to make this easier for the IDE to check.

tribbloid commented 3 years ago

at your service. These are my answers:

Vectors do not (yet?) compose.

The POC had only implemented DoubleVector, Not matrix or tensor. They are not difficult (ideally by stealing code from dotty tuple: https://github.com/lampepfl/dotty/pull/2199), just need a lot of code. Also, I'm using good old breeze as data representation but its a bit obsolete, it should move to a representation that supports at least big data (Apache Arrow, DJL), symbols and dual numbers (your library), or both.

tribbloid commented 3 years ago

How do you define a function with a fixed arity?

easy:

    val _3 = Arity(3)

    def printD3[A <: Shape](v: DoubleVector[A])(
      implicit
      proofOfArity: A MayEqual _3.Out ~~> Proof
    ): Unit = {

      println(v)
    }

    printD3(DoubleVector.random(3))

//    printD3(DoubleVector.random(4)) // doesn't compile

I'm kind of bothered by the church encoded Nat type as it crashed my compiler fairly often. I would recommend converting them ASAP using an implicit edu.umontreal.kotlingrad.shapesafe.m.arity.nullary.OfSize, or avoiding them all together and use strictly Witness or 2.13 Singleton

tribbloid commented 3 years ago

How do you write a function where one argument is fixed and the other is generic

You just need to summon proofOfArity for 1 of them, something like:

    def printD3[A <: Shape, B <: Shape](v1: DoubleVector[A], v2: DoubleVector[B])(
      implicit
      proofOfArity: A MayEqual _3.Out ~~> Proof
    )
...

edu.umontreal.kotlingrad.shapesafe.core.tensor.Shape -> Operand is just a computation graph, Only after implicit proof Summoning (Operand ~~> Proof | Proof.Out) could it be transformed into Arity. if you don't summon the proof they will never be computed. Some of my Vector API (e.g. pad) doesn't summon the proof because they will always succeed. You can even avoid summoning in all APIs and do it lazily (by using DoubleVector.reify at last) but it is kind of shabby

tribbloid commented 3 years ago

Minor issue, but the Scala plugin for IntelliJ IDEA does not type check compound expressions and must wait for compilation

Even worse, sometimes it gives false errors for type check. I'm always kind of jealous of F# programmers on Windows, they IDE never diverge from the compiler. But it is bad politics, Jetbrains thinks scalac:

changes too fast at the whim of Prof Odersky
has a conflict of interest with kotlin

So it probably doesn't worth their time. The problem is well know in scala community, and I heard that dotty PM has ditched their grad-student descent based methodology

tribbloid commented 3 years ago

I also have a few questions for you:

How hard it is to implement graph simplification?

https://enoki.readthedocs.io/en/master/autodiff.html#graph-simplification

For symbolic diff this could be tested by:

$$ \frac{d (2 x^2 + 3 x^2)}{d x} = 5 x $$

Do you have an example or test case for symbolic diff?

The reason I'm asking is to see the difficulty of writing in JVM that implements/extends operands and graph nodes of a low level autodiff IR (e.g. RelayIR in TVM2). I was initially against it as I thought it embodies too much work, but on second thought I believe you made the right decision: In prototyping stage people may encounter all kinds of esoteric functions, and if they can't insert it into the graph without some serious hacking, they may be forced to ditch the effort.

tribbloid commented 3 years ago

I have another reason (though not 100% sure): I believe that if you want to implement it from scratch, then the mixed-mode / hybrid should always be the default graph crawling algorithm (similar to forward-backward algorithm for HMM).

I knew status quo frameworks are already satisfied with reverse mode, but they are sub-optimal on hourglass-shaped architecture (autoencoder, GAN, Full-conv, UNet, HydraNet), and they are getting more pervasive. Also, naive & backward mode should be special cases of mixed mode. So maybe staying ahead of the curve is one way to to avoid the dilemma later

The mixed mode should depend on the above mentioned graph simplification, I don't have an exact idea at the moment but I vaguely feel that it involves finding the smallest subset of the graph that can reach all the inputs/outputs. Probably it is not the most important thing on our level and should be delegated to DMLC guys

breandan commented 3 years ago

How hard it is to implement graph simplification?

It depends on the representation you're using. In general, common subexpression elimination is NP-hard. If the graph is a DAG, the problem is GI-hard, or if the graph is a tree, it's something like $\mathcal{O}(n^{5/2})$ (Matula, 1978), or maybe a little easier. It's a well-studied problem in the compiler literature, and there are lots of good resources and efficient implementations around. Due to frequent nonlinearities in typical deep learning pipelines, there are not often many algebraic simplifications you can do, but it could be a useful optimization for differentiable programming more broadly. Kotlin∇ uses some ad-hoc simplifications, but it's probably not something we should be rolling ourselves. I'm still looking for a good solution, there are some good resources on computer algebra in the readme.

Instruction selection on directed acyclic graphs (Bruns, 2007)
Common sub-expression elimination using subtree isomorphisms (King, 2019)

Do you have an example or test case for symbolic diff?

There is a toy symbolic expression generator in our tests. I need to write some more extensive test cases using property-based testing. Chapter 4 of my master's thesis describes the theory behind PBT and metamorphic testing, which were experimentally validated, but have not yet been integrated into our CI test suite. Our high-level approach is to generate a large number of random trees and run a sensitivity analysis across numerical inputs. We need to set up some regression tests to check for runtime performance and numerical stability, but the idea is that you specify an invariant, and throw an error if you can find a simplification and inputs which violate it. If you're interested, there are some mature testing frameworks which automate this process, maybe check out ScalaTest if you're unfamiliar.

status quo frameworks are already satisfied with reverse mode, but they are sub-optimal on hourglass-shaped architecture

This is another active area of research known as Tensor contraction ordering or optimal Jacobian accumulation, which in general is NP-hard (Naumann, 2006). If you're just doing inference, there is a $\mathcal{O}(n\log(n)$ ) matrix chain multiplication algorithm (Hu & Shing). As you mention, the naïve implementation in most AD packages is not very efficient. For example, JAX uses reverse mode by default. You can also accumulate the Jacobians using a custom order (e.g. jacfwd(jacrev(f)) / jacrev(jacfwd(f))) but these too are suboptimal. As you suggested, it is possible to realize significant speedups by considering the structure of the computation graph. For example, TensorNetwork uses the opt_einsum package (which provides various strategies, e.g. brute force, greedy, DP) to search for the optimal contraction sequence. I was hoping to try throwing Z3 at the problem, but haven't got around to it yet.

tribbloid commented 3 years ago

Beautiful. At the simplification stage, all expressions can to be expressed as AST with some caching. E.g. the two $x^2$ in the above example:

$\frac{d}{dx}2x^2+3x^2=10x$

can be treated as 2 different nodes and be squashed within the complexity of a rule-based / cost-based optimiser. Only in the last stage can it be converted back into a graph for autograd.

tribbloid commented 3 years ago

After carefully studying your answers I believe I have enough information to figure out a real fitness in the market, and that niche is 2-scale autodiff OR megakernel: This is a well-proven trick proposed by MIT CSAIL soft robot team:

https://news.mit.edu/2019/model-design-control-robots-target-1122

for large scale physical simulation (Good to know that you are working for tooling of robotics, can make further discussion much shorter)

In essense, the author argues that:

Source Code Transformation (SCT) (Griewank & Walther, 2008) and Tracing (Wengert, 1964) are common choices when designing AD systems. In our setting, using SCT to differentiate a whole simulator with thousands of time steps, results in high performance yet poor flexibility and long compilation time. On the other hand, naively adopting tracing provides flexibility yet poor performance, since the “megakernel" structure is not preserved during backpropagation.

To get both performance and flexibility, we developed a two-scale automatic differentiation system: we use SCT for differentiating within kernels, and use a light-weight tape that only stores function pointers and arguments for end-to-end simulation differentiation.

(Quote from ICLR 2020 preprint: https://arxiv.org/abs/1910.00935 [1])

The capability is missing from most DL tools (all if shape safety is a concern) for a good reason, tracing (the conventional sense of reverse autograd, referred as "Tape" in torch & TF) is universally favoured for its flexibility (as they preferred, the 'define-by-run' feature), even for complex loops & control flows. This is where C++ crunch raw numbers directly after IR => bytecode compilation, and C++ / CUDA beats JVM fair & square for pure computation power, we don't have a fair game here. Plus, the mixed-mode autograd is in this cluster of mess and we may want to stay away.

SCT ("rewriter" based) on the other hand, uses most techniques you are familiar with: AST simplification, symdiff, and consumes little time on JVM. Its weakness for complex control flow was bypassed in the megakernel design, in which only ASTs of simple, monoidal execution blocks are rewritten & compressed (such that the tape skipped most intermediate values), and parallelised as accumulators.

Obviously this is not the only way for symdiff and autograd to work together, but it is a good start. Unfortunately, I'm yet to find a JVM based deep learning framework that allows me to plugin a complex enough megakernel into its tape (see my question at https://github.com/awslabs/djl/issues/140, the first author of [1] and the second committer of DJL are both active on social media, so feel free to ask them directly). This is the only loose end of my hypothesis.

If every details of my hypothesis can be proven it means I'm talking about an industrial grade autograd framework that:

- unifies physical simulation & machine learning

- leverages a mature ecosystem on JVM

- has shape safety

Do you see any loopholes in my reasoning? If not let's wait for a few days for the answer.

P.S. have you graduated? I know you won't be with Google, so what's your plan to find serious users of your work?

breandan commented 3 years ago

ICLR 2020 preprint: https://arxiv.org/abs/1910.00935 [1]

I've been studying DiffTaichi pretty closely over the last few months, and agree it's a good design choice. The tools they've built for physical simulation are also very impressive. My colleagues and I have been working on some similar ideas connecting differentiable physics and rendering that we hope to be able to share soon. Learning smoothing tricks to get stable gradients for collisions and discrete events has been really challenging and I think a good litmus test for where differentiable programming is headed in the future. I'm personally interested in making those techniques accessible to the broader field of software engineering.

Have you graduated? What's your plan to find serious users of your work?

This is probably more related to #8, but I am staying in school to pursue further research. Last year, I moved from UdeM to McGill, where I am now working to build a connection between automatic differentiation and learning on programs. I think the AD/ML community is doing important translational research, but in terms of this project, I feel the path to adoption requires taking a longer view on automatic differentiation than other libraries can afford. I think we're in a good position to do so, and have identified three high level goals for Kotlin∇: (1) language design, (2) symbolic reasoning and (3) graph computation.

One of our goals for Kotlin∇ is to provide a staged eDSL that resembles an imperative program (containing variables, control flow, functions and data types) for mathematical code. The language should look and feel similar to a scripting language like Python with a flexible type system and mathematically idiomatic syntax. I think it is important to actually use the DSL, so I will try to get to the point where it's fast enough for myself and others to use for training. Another project I recommend checking out is KMath, which has shared a number of inspirations and plans to support a much larger API surface (if that's important to you).

Another goal for Kotlin∇ is providing tools for AD/compiler research: solvers for equational reasoning, term rewriting and symbolic computation. Although normalization and canonicalization are undecidable in general, if we impose certain constraints, it becomes "just" NP-hard, using Knuth-Bendix or similar completion algorithms. It would be nice to provide a tool for determining if two symbolic expressions are semantically equivalent in a certain axiom system, by using logic programming (e.g. miniKanren, Prolog SAT/SMT solvers) or learning techniques (e.g. Siamese GNNs, symbolic pregression) for expression simplification.

Finally, our goal is to compile the entire graph (including control flow, data types, logic, etc.) to sparse matrices, where "executing" the program consists of pure matrix arithmetic. I think one day there will be a VM or interpreter for a high-level language that runs entirely on a matrix processor, only exiting to perform I/O. Users will be able to run existing programs and get parallelization for free. That's going to be a long term project, but there is some progress we can make today, e.g. leveraging GPU/TPU/IPU intrinsics using GraphBLAS. Graphs and matrix representaton is of the things I've been working on this summer.

Do you see any loopholes in my reasoning?

I think a hybrid tracing and SCT-based design makes sense and is a good compromise for the JVM. Another approach from Wang et al. (2018) proposes multistage-programming, which seems closer in spirit to what we are currently doing. Have you looked into Lantern and the LMS framework? I think it's a good architecture and also makes a lot of sense from a compiler perspective. What do you think are the advantages and disadvantages of these two approaches? Happy to discuss how to integrate with DJL or a similar backend. We don't want to reinvent the wheel, and I think that exercise would be helpful in the context of (1).

tribbloid commented 3 years ago

Thanks a lot for the ideas you brought up. If you are in the college, what's the earliest time you could plan for a co-op program? Would you like to apply for GSoC next year or some part-time job at a heavyweight JVM industry user? (e.g. awslab) No offense but I think comparing to hasktorch, a shape-safe JVM DSL alone could be much more relatable to these companies.

Sorry for my slow response as I don't know anything of compiler, it took me a few months to almost figure out the ecosystem of JVM-based staging. I remain uneducated to the state-of-the-art research, but can at least narrow down to 1 or 2 directions. But there is one silver lining: my mind will no longer be contaminated by compiler front-end (like parsers, lexers, AST generation or anything suitable only for human), a bless that would be impossible if I start reading a few years earlier

The LMS is a brilliant idea - I wish I could say the same about execution but it's too early to tell. The old LMS project in the 2014 paper got my attention twice - for 2 different prototypes, and in both cases it looks like a strong contender:

1st: Prior to 2018, a bank I indirectly worked for was trying to use a self-rewriting trans-compiler, something similar to this: https://arxiv.org/abs/2006.03511, to rewrite tons of their SAS code into python. The proposal stay on paper for a long time, during which I even (unsuccessfully) tried to pitch Thorsten to build a team for this prototype (can't remember the detail, @tscholak please correct me if I'm wrong). The LMS was evaluated for its ability to drastically reduce the size of the action space. Theoretically, it won't generate characters, tokens or invalid AST, but only IR that are void of most low-level compilation error. Ironically the proposal was killed after the old director of the bank retired. In hindsight, it seems that the bank most likely can't afford the cluster to train it anyway, so no string attached.
2nd: My full-time job is for a data security product that runs on Apache Spark. One of our prototypes is to try to leverage the extremely verbose Tungsten whole-stage SQL CodeGen (https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html), to shrug off a lot of RDD execution overhead. The LMS team had a early solution (https://flaredata.github.io/) on C++, not the java code the Tungsten engine was generating, but I figure it could be easily converted to do so. So I wrote to them and asked to be a beta tester, they haven't reply so far :-| The project is still ongoing, just not sure if 'flaredata' can keep up with the mainstream for long enough.

The way I read it, the LMS stands out for eliminating the front-end, human-interacting part completely (aka the 'lightweight' in its name), making it equally transparent to human and machine agent alike, if your research outcome can do the same on kotlin WITHIN ITS language specification that will be awsome. But everything else should ideally be decoupled: the multi-staged, cross-platform code generation is important and should be in good working order, but they should be gradually replaced by more optimised components in a very long roadmap, instead of becoming a bilateral problem of go big or go home. This is why I said I'm not a big fan of its execution: It has 2 hard forks, which is bad enough. But the worse thing is: both of them may overstretch, and risk evolving into over-engineered clusters of inscrutable, esoteric elitist magic that no one wants to touch. These hard forks are:

dotty-staging (http://dotty.epfl.ch/docs/reference/metaprogramming/staging.html): this is kind of a scala 3 fork of part of LMS. The other part was rejected early (see https://gist.github.com/odersky/f91362f6d9c58cc1db53f3f443311140), despite that Prof. Nada Amin is an LMS coauthor. It is not lightweight anymore, and use a front-end similar to the old scala 2 macro, with tons of compile-time plugins and backroom IR generation. I guess one day people will eventually figure out how to use a machine agent to act on it, I just won't count on that in reasonable time. Fortunately, the generated IR is fairly human readable and computer language agnostic.
lms-clean (https://github.com/TiarkRompf/lms-clean/blob/master/doc/main/scala/lms/core/backend.md): a very ambitous full rewrite, with sea-of-nodes IR and gradual typing. The sea-of-nodes IR is evidently targeting the SIMD architecture code optimisation, particularly, the CUDA users who used to spend long time handcrafting megakernels, in frameworks like difftaichi or TF. I admit that it would be a much more superior IR design and SIMD compiler, if all promises have been figured out. I also think it is a much harder problem than the problem LMS initially proposed to solve, thus, the first functioning project shall never make manual & automatic kernel code generation mutually incompatible, gradually the manual one can be phased out. Regardless, it is a research project and doesn't need to be functional in the industry in reasonable time.

TL;DR: I saw LMS and difftaichi solving problems on 2 different levels, and their ideas can be made integratable and work together. Unfortunately, I'd rather wait and attempt on their industry adoption later, after all the dust have settled and only 1 fork remained prevail. In addition, there is a looming threat that SIMD compiler becoming a winner-take-all market (of which MXNet is almost out), so one way or the other I encourage you to push your ideas and works early, instead of waiting for things to become mature

tribbloid commented 3 years ago

BTW, Is part of the GSoC 2020 work integrated into this new project?

https://github.com/google-research/dex-lang

Looks like a lot of overlapping

breandan commented 3 years ago

Hi @tribbloid, thank you for your interesting reply! This gives me a lot to think about. As you mentioned, there are some great projects in differentiable programming, including Myia, JAX, DEX, Hasktorch, TVM, MXNet, Software 2.0@FB, TF4j, KotlinDL and Swift in no particular order. While I share your enthusiasm and have tremendous respect for the people undertaking these projects, I have grown skeptical that differentiable programming is the panacea it claims to be. Nevertheless, I think DP has a lot of mileage left and am glad we will have no shortage of good DSLs from which to choose.

I am currently in the middle of another project, and have been unable to devote as much time to Kotlin∇ as I would like. However, I will continue to support it and envision our current line of research may one day bear fruit for Kotlin∇, if successful. One of my passions and the reason I started this project in the first place was types. Like you, I was working on SQL code generation some years ago and fell in love with type-safe DSLs. Working on this project, I rediscovered that creative spark. I'm pleased to report that many of the features which Kotlin∇ introduced have been absorbed into KMath due to @CommanderTvis and @altavir. I would like to thank them for their efforts and strongly encourage users who need a stable API to check out their work.

making it equally transparent to human and machine agent alike

Agreed, we need an IR which is interpretable to both consumers. λ-expressions are extremely powerful metaprogramming tools, but hopelessly unreadable. I am awed and jealous of anyone who can tame the hydra that is Lisp! As an end-user, types feel like the promised bicycle of the mind, but type-safe metaprogramming leaves much to be desired. Recent work seems to be addressing that, but type-level programming is an area of ongoing research and I'm not convinced it's ready for primetime yet. Recently I've been learning about logic programming, which seems like a suitable compromise for many domain-specific tasks. If you have any experience using Datalog/Prolog, I'd be curious to get your feedback about it.

I've learned that logic programming and graph query languages share many similarities. Many techniques in logic programming can be evaluated using graph and lattice-based algorithms. One area I think Theano got right was incorporating a number of graph-based algebraic optimizations. There are few projects which have tried to organize a similar set of domain-specific rules including Rubi, Fungrim, Metamath et al. I'm sure there are more automated approaches, but it helps to have a database of common identities from which to start. I feel these abstractions could be collected and reused for compiler optimization and DSL generation.

Another place where types arise is knowledge graphs, which seem to have a lot of promising applications for automated reasoning. I think it would be a nice idea to have a knowledge-based compiler for domain-specific applications such as matrices and random variable arithmetic. One technique which is apparently well-known in the compiler community is the idea of equality saturation. Users write down a bunch of equalities which the compiler can apply to optimize some desired heuristic (e.g. speed or numerical stability). This approach seems well-aligned with recent work on learning to rewrite.

TL;DR: Kotlin∇ is on the back burner for a while. I need to make progress on my Ph.D., which grew out of ideas in this project, and I hope one day will return to plant more, but it will take some time for that to happen.