FluxML / Flux.jl

Relax! Flux is the ML library that doesn't make you tensor
https://fluxml.ai/
Other
4.53k stars 610 forks source link

Encoding array dimensions in flux type system? #614

Open datnamer opened 5 years ago

datnamer commented 5 years ago

https://news.ycombinator.com/reply?id=19131650&goto=item%3Fid%3D19129641%2319131650

the default array typing in flux.ml doesn't encode tensor dimensionality in the type system. If it did (which it very easily could in julia) you wouldn't wind up with a situation where your learning task halts in the middle of a training run, which can happen in flux.ml

From that comment. Makes a good point...any plans for "stronger" or "moar" typing in flux? Is it just a matter of using static arrays?

jekbradbury commented 5 years ago

XLA.jl contains a statically-shaped array type (XRTArray) that's needed for XLA compilation, so it wouldn't be too hard to bring something like that to plain Flux. But dynamic dimensions are also useful, so it would be best to have an optionally-static array type and I think nobody's written one yet.

KristofferC commented 5 years ago

With "vanilla Julia" you would get the error message at the same time at the same place as before though? You would likely just change a DimensionMismatch for a MethodError since all errors in julia are deferred to runtime.

MikeInnes commented 5 years ago

So there's multiple kinds of information we might be interested in here: rank (e.g. matrix vs vector), dimension labels (e.g. channel vs space) and size (e.g. matrix of size 5x10). Then there's a question of which of these things we check statically. ("Optionally dynamic" dimensions should just be type variables, e.g. 5xN where N.)

As Kristoffer points out there's no difference between StaticArrays and Arrays in Julia here; both store shape, with the only difference being (for present purposes) an implementation detail of Julia's compiler. Static checking is really a new piece of tooling at the language level (perhaps via a package like Traceur.jl or TypeCheck.jl).

It's worth pointing out that this is a very general and long-standing problem in numerical computing. In practice, most "static" languages are actually dynamic as for these purposes, only checking trivially for Array{T}; Swift is currently in that boat and even Idris exposes both both Vect and List. A lot of what Julia does works only because we can do arithmetic on types without worrying about correctness proofs; if someone writes a program that errors only if Goldbach's conjecture is false, we'll just heat up their CPU and move on.

The most promising idea I've seen is to have an "overlay" type system independent of the language's native one (whether static or dynamic). You can then go the TypeScript route and assume correctness when you have to give up (soundness and Hindley-Milner are so very 90s). I'm not going to push it as I'm happy to wait until the Swift folks try it for us, but if someone wants to hack on it or propose a GSoC project we'd be up for helping out. Consider yourself warned about the dragons, though.

jekbradbury commented 5 years ago

("Optionally dynamic" dimensions should just be type variables, e.g. 5xN where N.)

That might make sense in a method signature, but it would still lead to generation of specialized code for each value of the dimension size. What I mean is something like OptionallyStaticArray{Float32, 3, (10, 10, missing)} where the third dimension is just as dynamic as the dimensions of Array{Float32, 3}.

mcabbott commented 5 years ago

Vector{SVector{3,T}} is close to being a 3×N OptionallyStaticArray, I'm a bit surprised that StaticArrays doesn't explicitly offer this. I can't even find an issue.

Although it's not clear to me what problems "in the middle of a training run" would be solved by this --- why would converting all your training data to such a type ahead of time be easier than checking sizes of ordinary matrices?

MikeInnes commented 5 years ago

Static checking is generally orthogonal to specialisation (and if it's not in certain other languages, well, that's their problem). You'd expect a shape checker to give the same results for Array, OptionallyStaticArray etc. since they all ultimately have shapes, regardless of where they are stored or are visible to Base inference.

A more extreme example is function types. Functions are only nominally tagged in Julia, which makes sense for specialisation, but to meaningfully check higher-order code you'd want something like an arrow type (possibly generalised to handle dispatch somehow).

So for the purposes of the usability/interface issue of catching errors early, we can largely ignore what actually turns up in type tags. (The HN comment's reference to this being easy might be based on using Core.Compiler.return_type as a sort of type checker, but it's kind of a crappy one since it can't actually prove correctness even when it works. There's also some general overloading of the terms "static" and "dynamic" going on, but what else is new.)

MikeInnes commented 5 years ago

I think JavaScript actually separates out the issues most cleanly. In JS you have runtime tags (prototypes) that come from its object model but are basically used for very little else. Then you have the types that V8 specialises on at runtime (which might include e.g. object fields that aren't in the tag). Then you have TypeScript's overlay type system, which turns prototypes into classes and adds things like static parameters, none of which ever exist at runtime – they are only for program correctness. So there you have three types of types, all of which are in principle independent (though most static languages conflate all three).

[The only sensible conclusion here is that JS is the ML language of the future.]

datnamer commented 5 years ago

@MikeInnes sounds similar to the relationship between Mypy/type hints and python runtime classes.

jekbradbury commented 5 years ago

If I'm trying to compile for a specialized architecture that won't let me use a type-tagging runtime, though, the picture looks a little different. I may care about simply being able to generate code that's static enough for this architecture (using types for specialization), or I may care about being able to generate code that is simultaneously sufficiently static for my architecture and sufficiently general to work on all possible inputs (so that I can deploy a single binary).

If I want the latter, I need to use types for specialized code generation and also statically check that the types I've specialized on will cover all cases I might encounter at runtime. In the case of shapes and architectures that care about them, this might mean checking that certain dimensions (which map more directly to the hardware and need to be static in generated code) will only be inhabited at runtime by one of a known set of sizes.

MikeInnes commented 5 years ago

Sure, that makes total sense and I don't think I'm in disagreement. It's a case of specialising on the tag OptionallyStaticArray{(10, nothing)} and then checking against the type "matrices of size (10, N) for all N". I'm just saying that those are two different tools, since both of those "types" have very different semantics. Of course, you might still want to abstract over that so that you can ask for both guarantees at once.

Edit: it might be useful to discuss a concrete example. If I have something like

x::OptionallyStaticArray{(10,nothing)}
y = hcat(x, x)
z = vcat(y, x)

this will infer just fine as z::OptionallyStaticArray({20,nothing}). But a shape checker worth its salt should resolve y as "(10,2N) for all N" and throw an error, which is quite different behaviour.