elixir-nx / nx

Multi-dimensional arrays (tensors) and numerical definitions for Elixir
2.63k stars 189 forks source link

Infix `dot` operator #236

Closed elcritch closed 2 years ago

elcritch commented 3 years ago

Having an infix matrix dot operator would make writing Nx code much nicer. It's one of the most common operations and not having an infix operation makes many formula much more difficult to read (or illegible).

One big "win" for numerical Python was having @ added as an infix operator specifically for matrix dot operations [0, 1]. The syntax in Julia (which follows Matlab) is to have most math operators work on a matrix level while also have an element wise version with a prefixed ., like .* or .+, as generalized broadcast operations in Julia [2]. R appears to have *, %*% and %o% infix operators for elementwise, inner, and outer matrix multiplication.

The pipe operator really does help significantly, even with simpler stats oriented code (I've tried a few different formats [4]). Still it's harder to read if you're familiar with "broadcast" syntax from other mathematics oriented languages like Matlab or Julia.

My thoughts after reading about R's operators would be that <*> would be a good candidate for an infix dot operation. It fits in with Elixir's current (generally unused) operators:

\, <-, |, ~>>, <<~, ~>, <~, <~>, <|>, <<<, >>>, |||, &&&, and ^^^

(thanks to https://news.ycombinator.com/item?id=26178627)

This could transform a function like this:

  defn predict({w1, b1, w2, b2}, batch) do
    batch
    |> Nx.dot(w1)
    |> Nx.add(b1)
    |> Nx.logistic()
    |> Nx.dot(w2)
    |> Nx.add(b2)
    |> softmax()
  end

to this (with Nx namespace autoimported and if I translated it correctly ;) ):

  defn predict({w1, b1, w2, b2}, batch) do
    softmax( logistic( batch <*> w1 + b1 ) <*> w2 + b2 )
  end

or perhaps if reusing the existing <|> operator:

  defn predict({w1, b1, w2, b2}, batch) do
    softmax( logistic( batch <|> w1 + b1 ) <|> w2 + b2 )
  end

Ideally this <*> operator would have the same (or similar) precedence as *.

0: https://alysivji.github.io/python-matrix-multiplication-operator.html 1: https://www.python.org/dev/peps/pep-0465/ 2: https://julialang.org/blog/2018/05/extensible-broadcast-fusion/ 3: https://www.statmethods.net/advstats/matrix.html 4: https://github.com/elcritch/matrex_numerix/blob/5835a9b477d8ea41bb9b862272a0997fe37c1236/lib/fft.ex#L81

josevalim commented 3 years ago

Hi @elcritch! I think we can easily start a discussion about adding some custom operators to Elixir with the same precedence as the current mathematical ones and I think <*> is a good candidate.

However, I would like to take a holistic approach to this, and ask ourselves which operators we think would be necessary, and then propose a batch at once. Looking at Matlab, they provide <*>, </> (right inverse) and <\> (left inverse). R provides %o% but none of the other languages do, so I am unsure how important it is in practice.

So let's leave this open for discussion and then act accordingly. Thanks for posting!

hauleth commented 3 years ago

@josevalim quick idea. Allow any non-whitespace sequence within <> to be an infix operator. Alternatively allow any set of ASCII characters that are in [\x21-\x2F\x3A\x3B\x3D\x3F\x40\x5c\x5f\x60\x7e] (that is most of printable, non-alnum, non-whitespace, non-del, characters). So one could define <*> as well as <*-*> or <@_@> operators. All of these would be left-to-right and have the same precedence as |>. That would allow anyone to define their own custom operators for whatever uses they would need.

josevalim commented 3 years ago

I really don't want to allow unbound operators in Elixir. So I am fine with adding it on demand. Especially because having the same precedence as |> won't be enough for the cases we are discussing here. :)

elcritch commented 3 years ago

<*>, </> (right inverse) and <>

That'd be fantastic! It crossed my mind but I wanted to focus on the highest value target (<*>) since </> and <\> are actually complicated operations and essentially solve a system of equations. They require something like QR factorization or Gaussian elimination. The standard division for matrices is elementwise division from my experience which a standard / should be fine or even to reuse \\ as Julia & Matlab use \.

If anything an operator for integer div would be much more useful, especially for indexes. I believe Python uses // and Julia relies on it's typing IIRC. In my FFT code using Matrex (which would be super easy to port to Nx) I ended up with a lot of code like:

      |> Matrex.divide(nn))[1..div(nn,2)]

Moving towards this would be nice:

      |> Matrex.divied(nn)[1..nn//2]

(perhaps that's handled in the AST conversion process)

elcritch commented 3 years ago

Also, what's the story for operators and intermingling scalar and matrix values? When working with Matrex one of the sources of errors (and frustration) was going from the output of an algorithm written using Matrex matrices and then needing to handle the scalar output (P.S. when I get time I'll look into porting matrex_numerics to Nx as the Numerix folks did a lot of handy stats stuff already and I added a very basic FFT and Gaussian fitting to it, e.g. sort of similar to scikit libraries for NumPy).

BEAM is fantastic in so many ways but since it doesn't support standard IEEE 754 numbers, particularly NaN and +/-Inf, it's a bit tricky interfacing with other numerical code or handling overflows, etc. The problem would be that I had a nice (fast-ish) algorithm using matricies but then I'd try and take the (scalar) output and display it on a LiveView Dashboard. It works until you format numbers or scale them by a simple factor (1.0e6 * someValue) which would break. It's possible to workaround but tedious. My solution was to override the Kernel arithmetic operations and handle :nan and :inf atoms that Matrex would return when it's floats were NaN or Inf. But I never had a good solution to the "display and format this number" other than case matches.

In my mind, the ideal case would be for Elixir to support a standard pattern for :nan and :inf atoms. For example, if you were dealing with IEEE numbers you could import Nx.IEEEArithmetic in a module and know that *, +, etc would all handle :nan and :inf by propagating them (probably via pattern matching). It would be a big QoL for doing numerics in Elixir.

It may be something for a separate issue, but it does overlap with how arithmetic and maths operators function. Apologies if this has been discussed elsewhere but I didn't see it when I skimmed through all the Nx issues.

seanmor5 commented 3 years ago

For reference we have this issue for handling NaN/Infinity: https://github.com/elixir-nx/nx/issues/119

IIUC...

The default Tensor inspection implementation matches on IEEE 754 patterns for NaN and +/- Inf for BF16, F32, and F64. Backends/Compilers that return IEEE 754 values should correctly return the binary pattern for those values and will display the appropriate +/- Inf, NaN, etc. AFAIK Nx.to_flat_list will correctly convert those patterns to strings: https://github.com/elixir-nx/nx/blob/main/nx/lib/nx.ex#L928-L988 for handling in Elixir however is needed.

As far as road map for handling this issue, not to steal @josevalim thunder, but you might be interested in: https://github.com/erlang/otp/pull/4537

The default backend will raise right now instead of handling inf, nan correctly. The only real fix without an Erlang patch is to tediously pattern match on Inf/nan in every single binary implementation, so we've just documented that they'll raise for now

josevalim commented 3 years ago

I think it will be hard to find an operator for integer division that looks reasonable. All of the usual suspects are taken. We could use something like </> but I think that would be confusing, especially if we use <*> for dot (which is matrix specific).

scalars are promoted to tensors once you perform tensor operations that’s because the type is also important. Note I have just opened a PR on OTP to support representing infinity and nan within the VM naturally.

AndrewDryga commented 3 years ago

Would it be better to have <dot> operator instead of <*> or anything like that? It's 2 symbols more to type but eliminates operator naming and a need to remember it. It even looks like a special syntax for a function call where arguments are on both ends of a function name: left <fun> right. And I'm sorry if my idea is silly :).

josevalim commented 3 years ago

Haskell does support a similar construct, which is to use backticks: left `op` right where op can be any imported or fully qualified function. But I have a feeling this isn’t enough. I would love to hear from people actively involved in data science/numerical computing their thoughts on this.

elcritch commented 3 years ago

It’s kind of nice to be able to do `func`, but you’d want to be able to define the precedent level right? Though even with a fixed precedence it’d still be more readable having infix functions. Then you could define more operations like `cross` (<cross>?).

Perhaps a post on elixir forums would get some more data science folks in the discussion? A general question about improving numerical coding in Elixir might bring up some discussion. :-)

@versilov @ityomemo any interest on the topic?

seanmor5 commented 3 years ago

I'm okay with adding infix operators, but I would personally prefer composing functions with |> as a pseudo style standard for using Nx. Ultimately though if most people agree with adding and using infix operators, I think it makes sense.

A different perspective would be introducing something like Einstein notation as a DSL. I'm not really sure how possible that would be to support within defn, but I've always heard good things about people being able to write more concise code with something like einops. I've never really used it, but I've heard enough great things about it to offer it as an option. And it fits in nicely with named tensors. A drawback here is that using something like einops doesn't necessarily make the resulting code easier to understand or idiomatic for a beginner.

AndrewDryga commented 3 years ago

I think it will be hard to find an operator for integer division that looks reasonable. All of the usual suspects are taken. We could use something like </> but I think that would be confusing, especially if we use <*> for dot (which is matrix specific).

Can we pass an algebra to defn and override operators depending on such algebra? Then we can use * for matrices operations (when algebra: :linear is set). And the same operator in another place would mean integer multiplication with a different algebra. That potentially would push people to use smaller algebra-specific functions and combine them in high-level functions, which is usually good for code readability. The drawback is of course that now you need to consider algebra used when using any kind of infix operator (which seems okay because you don't actually need matrix multiplication and integer division defined in the same defn) and I'm not sure how such code would be structured in textbooks.

With small functions pipe would be used more too as @seanmor5 wants, you would basically pass tensor from one function to another one with different algebras used.

But I'm not writing a lot of code that does math lately so also curious what opinions people that deal with science/numerical computing have.

josevalim commented 3 years ago

Having * mean something else depending on an option is asking for trouble. :) it means when reading code you need to look at the surrounding context for option only so you understand what certain operators mean. I don’t think we should pursue the direction.

hauleth commented 3 years ago

@josevalim I would say that this is common in math that one operator changes it's behaviour depending on context or argument types. But I agree that it isn't ideal for a programming language.

versilov commented 3 years ago

@elcritch @josevalim I vote for <*> as an infix dot operator. Already suggested it back in 2018: https://groups.google.com/g/elixir-lang-core/c/DqXqLJ5kXMc?pli=1

elcritch commented 3 years ago

I'm okay with adding infix operators, but I would personally prefer composing functions with |> as a pseudo style standard for using Nx. Ultimately though if most people agree with adding and using infix operators, I think it makes sense.

The only infix operators I believe would be valuable enough for the effort (and language overhead) would be dot operations via the <*> (inner product) notation and <**> (outer product). I expanded to both because both the outer and inner dot product is used much more heavily in NN's and DL than the maths I'm more familiar with using. It's also seems pretty intuitive to my mind to remember *, <*>, and <**> as scalar, inner product, and outer product.

The rest like cross, outer, kronecker, etc can all compose well with |> and a few parentheses. But so many algorithms deal with dot operations: FFT, QR factorization, Neural Networks, Gaussian processes, etc. Looking through the Numerix library port to Matrex I did there's a fair bit of both inner_dot and dot. In my background I tend to think of dot as inner product, not outer product but in neural networks the reverse is more common.

Here's a list of math functions where Matrix dot operations were used to convert statistical code and light ML code to using matrices (it's not an enormous list, but it's a decent percentage and mostly when I added more complex ML algorithms):

    norm(p, x |> Matrex.subtract(y) |> Matrex.dot(w)) # stats, distance
    sum_of_products = inner_dot(x, y) |> Matrex.scalar() # stats, correlation
    mu = mx |> add(kfx |> inner_dot(alphaf)) # ML, Gaussian processes 
    Matrex.sum(x |> Matrex.inner_dot(w)) / Matrex.sum(w) # ML, Gaussian processes 
    Matrex.sum((x |> Matrex.subtract(mean_x)) |> Matrex.dot_nt(y |> Matrex.subtract(mean_y))) / divisor # stats related

Note, I ended up using subtract vs - in these cases because it's slightly faster IIRC since there's one less function to call when using regular Elixir NIF's (given no AST transforms). The |> really does make it more bearable but it's really built into my brain to more intuitively grasp mu = mx + kfx <*> alpha_f than mu = mx + (kfx |> dot(alpha_f)), mu = mx + dot(kfx, alpha_f), or mu = mx + (kfx |> Nx.inner_dot(alpha_f)), etc.

@versilov I didn't realize you'd already had this discussion! But it seems that last thread got a bit derailed with the idea of adding even more operators, etc. To avoid derailing this one, I'd suggest we limit the discussion to only the dot operators: <*> and <**>. Then if people really want more infix operators, to address it in separate discussions / issues. :-)

In the end I think it's really a feature that I'd sure enjoy having it but not it's not critical.

elcritch commented 3 years ago

A different perspective would be introducing something like Einstein notation as a DSL. I'm not really sure how possible that would be to support within defn, but I've always heard good things about people being able to write more concise code with something like einops. I've never really used it, but I've heard enough great things about it to offer it as an option. And it fits in nicely with named tensors. A drawback here is that using something like einops doesn't necessarily make the resulting code easier to understand or idiomatic for a beginner.

@seanmor5 Einstein notation would seem like a much more Elixir approach to me than adding a bunch of broadcast or elementwise operators as Julia, Matlab, or Nim did. It would match with Ecto a bit in having a well thought out DSL for a specific task. Einstein notation is hard for new comers (or myself after a few weeks of not using it). Though Elixir has the benefit of great compile time abilities and tooling. :-)

josevalim commented 3 years ago

@elcritch thanks, just one note! We are planning to ** for exponentiation. That probably rules out using <**> for outer because it would have higher precedence (one that matches **). But do we need an operator for outer? Only R seems to add one of all the languages mentioned so far.

elcritch commented 3 years ago

@josevalim you're welcome!

Regarding the outer product, it's a good question I think... I was basing my reasoning on Matrex using dot as what I'd naively call an outer product, but searching Google a bit it seems that I'm accustomed to thinking in terms of vector / matrix pairings which have different implied meanings than a 1xN matrix dot MxN matrix operation (or would that be NxM, I'm out of practice!). On this topic, I experimented with extending vector support in Matrex to include overloading dot in the way I was accustomed to thinking in when using a vector vs matrices. Otherwise I ran into errors translating various numerical algorithms (e.g. FFT) from text books to code as I constantly mixed up the proper operations. It was the hardest part of porting over generic algorithms to a library primarily focused on matrices (ahem tensors as the DL folks prefer ;) ).

Arraymancer in Nim has this interesting table: Linear algebra notation comparison.md. So perhaps it's not actually that common to use outer products but I was mistakenly mixing matrix-matrix with matrix-vector or vector-vector operations?

Searching a bit seems to yield that dot is called the inner product in a few random NN / ML pages ( 1, 2, 3 ). Though, many of those seem to be discussing vectors dotted to matrices.

All of that to say, perhaps a single <*> dot product combined with a first class vector type would handle the majority of the cases properly (as seems to be the usage patterns in that Arraymancer table).

@versilov is using the outer product common in neural networks or deep learning?

versilov commented 3 years ago

Don't know, I am not a data scientist actually:)

elcritch commented 3 years ago

Ok, here's a concise summary: https://bic-berkeley.github.io/psych-214-fall-2016/dot_outer.html (in particular the last section "Dot, vectors and matrices").

Essentially whether you get a inner or outer product with the common dot operation depends on whether you're passing vector or matrices. I'm not sure how TensorFlow handles those cases, or if even has a true vector type at all. However if you have a vector type then you only need a single dot operator, in this case, <*> with operator overloading for various combinations of matrix and vectors. This resolves the difficulties of implementing, say an FFT library, as the end user can pass a proper vector and not need to know whether to pass a row or column vector.

So just a single operator <*> would handle the cases I was thinking about.

seanmor5 commented 3 years ago

@elcritch We generalize everything to tensors, so there's no distinction between a vector, matrix, etc. You can see here how we handle dot with different rank combinations: https://github.com/elixir-nx/nx/blob/main/nx/lib/nx.ex#L5709

We can't really make the distinction between a row and column vector without maybe including additional metadata in the Tensor struct. But as you mention we can handle different rank combinations with one operator

elcritch commented 3 years ago

@elcritch We generalize everything to tensors, so there's no distinction between a vector, matrix, etc. You can see here how we handle dot with different rank combinations: https://github.com/elixir-nx/nx/blob/main/nx/lib/nx.ex#L5709

@seanmor5 great thanks! Seems to be naming differences. lol, it's a bit strange to me that so many new numerical libraries call everything a tensor. The last graduate coursework where I dealt heavily with the term tensor was in the context of Continuum Mechanics and FEA where a tensor is a matrex that represents a physical quantity. ;-)

We can't really make the distinction between a row and column vector without maybe including additional metadata in the Tensor struct. But as you mention we can handle different rank combinations with one operator

I just tried it and it appears dot functions as I'd expect and should map directly to <*> what appears common usage. I've been told that the usage of row and column vectors comes largely from the Matlab days since it didn't have a distinct vector type . In that case I think they're just rank-2 tensors so you can represent them in Nx already it seems, but it's only useful if you're using the row's or column's of a rank-2 tensors.

@josevalim what would be the next step to getting a <*> operator? I doubt there will be a lot of people really excited for the change, but a few of us would seem to like it. :-)

elbow-jason commented 3 years ago

I think introducing infix operators is bad for legibility, barrier of entry, mental overhead, and will almost without a doubt discourage newcomers.

When I look at code that has <*> (or any other infix operator that is a symbol) to understand the code I must know some things first:

Once I know it's called "dot" I get to answer another slew of questions:

The questions above may seem like I am try to stand up a straw man, but these questions are the exact kinds of questions that newcomers, non-mathematicians, and programmers will ask. Some of these answers are trivial, but some are not. And some of these questions you must keep at the forefront of your mind every single time you use them (order of precedence).

The mental overhead of learning custom-to-Elixir-or-Nx infix operators that have no name and (in default usage) no parenthesis to delimit the scope of their application will make some people abandon Nx without a second glance and it will discourage non-math-syntax preferring developers from participating or even using Nx.

Jose made a choice when he started down a path with Nx to use very precise and full function names for operations: divide, multiply, subtract, power as opposed to abbreviations like div, mul, sub, pow. I don't know why he chose to use the full names, but after working in a math library that is not a mix of domain specific symbols and abbreviations I don't think adding those things is a good idea.

josevalim commented 3 years ago

Thanks everyone for the discussion so far. As mentioned above, I don't want to rush this decision. I want folks who are new to Elixir and have a math background to tell us how necessary those custom operators are to the language - so let's wait for feedback. :)

maxastyler commented 3 years ago

As a physicist, I would find infix notation for the inner product to be really useful (something like <*>). I find it helps a lot with readability in calculations. We use the outer product quite often in quantum mechanics (eg. to combine states), but I feel like I wouldn't really want an infix for outer. I reckon that a newcomer might be confused as to:

I always find myself just using einsum when doing outer products anyway (or inner products on higher order tensors) - so I would definitely like to have an einsum-like DSL in Nx.

Also - I would vote for <|> as the inner product operator, since that looks just like bra-ket notation.

wstucco commented 3 years ago

Haskell does support a similar construct, which is to use backticks: left `op` right where op can be any imported or fully qualified function. But I have a feeling this isn’t enough. I would love to hear from people actively involved in data science/numerical computing their thoughts on this.

glad to hear that a question I posted a couple years ago on the Elixir forum is gaining some traction.

https://elixirforum.com/t/infix-notation/19143

My original use case was exactly the one we are talking about here: helping my data science colleagues that wanted to use Elixir but were more familiar with infix operators.

AndrewDryga commented 3 years ago

@elbow-jason I think a lot of those questions would be easy to answer if we call it <dot>, even arrows show where are the arguments. It's still an additional syntactic construct you'll need to learn though.

josevalim commented 2 years ago

Closing the discussion for now, as it depends on changes on Elixir upstream.