Matrix exponential using pade approximation

Brief performance summary, pretty sure I can optimize this quite a bit, but still not bad. Most of the speedup is avoiding intermediate results in most of the pade methods. This will actually get quite a bit faster with the new iteration.

Numpy

In [1]: import numpy as np

In [2]: a = np.random.rand(200, 200)

In [3]: from scipy.linalg import expm

In [4]: %timeit expm(a)
10.9 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Num.cr

a = Tensor.random(0.0...1.0, [200, 200])

Benchmark.ips do |b|
  b.report("Matrix exponential") { a.expm }
end

# Matrix exponential 142.73  (  7.01ms) (± 1.60%)  8.55MB/op  fastest

crystal-data / num.cr

Matrix exponential using pade approximation #41