esa / neuralg

Neural network approximators of linear algebra operations on GPU with PyTorch
GNU General Public License v3.0
14 stars 0 forks source link

Create a baseline and minimal working prototype #3

Closed gomezzz closed 2 years ago

gomezzz commented 2 years ago

Once we decide on the problem, we should try to find a minimal subproblem to start with (e.g. matrix inversion, start with 3x3 real matrices). Then, we can build a prototype notebook for that and from that decide how to make it into a nice module.

If we have ideas about it already, we might already consider different matrix sizes etc.

toveag commented 2 years ago

Current train of thought, in general independent of choice of problem. As of now, I have three vague approach ideas:

  1. Try "neural-dynamic" system approach based on the existing work as in related literature. We formulate the problem as a system of ODE's representing a (time continuous) Recurrent Neural Network. The solution is the steady state/equilibrium of the network. If I have understood it correctly, this is strictly speaking not a supervised learning problem, as the connection weights in the network are uniquely defined by the input matrix. Advantage is that there are some results on it already. Disadvantages/challenges include finding a natural interfacing with PyTorch, and not really state-of-the-art.
  2. The "naive" approach of generating a training data set of examples (i.e. matrices A_i)) and ground truths e.g. inv(A_i). The initial challenges are appropriate representation of the data (i.e. should it be in matrix, vector or some block/sequential form), and finding/motivating a network architecture that works in a semi-rigorous way rather than through trial and error. Also, as mentioned, constructing a smart training set.
  3. A more "algorithm approximation" approach, so to speak. I thought of this idea of treating iterates from a standard numerical method (power iterations for eigvecs or Newton iterations for inversion, say) as a sequential data. The task is then, given X_0, to predict X_1,...,X_k. We take the last iterate as our approximation. Such iteration formulas are quite simple functions of A, like polynomials, and intuitively perhaps easier to learn than e.g. the mapping A -> A^-1. Not sure if it makes sense in practice, and could be inefficient. Although, might be interesting to try with newer methods, some RNN variant like LSTM or even transformer.
gomezzz commented 2 years ago

Let's start with naive approach first and see where it goes.

Next steps:

gomezzz commented 2 years ago

Start with a single matrix and overfit on that to see if you can invert that

Then move to a few different ones, to see if you can fit and so on ....

gomezzz commented 2 years ago

maybe aiming for a decomposition could be more tractable?

toveag commented 2 years ago

Progress so far includes that the pipeline seems okay, so far ive managed (over-)fitting a ReLU 6-layer MLP to 1000 matrices with some sort of accuracy. The off-diagonal elements of X*X_inv_pred are of approx. 10^-3. For fewer matrices its smaller, obviously. With 8 layers, I get reasonable yet not satisfactory results when trained on 10000 matrices, off-diagonals of max 10^-2 . Diagonal is close to 1. All this is for matrices that are similar - they have eigenvalues close to each other and are simultaneously diagonalizable so they have pretty homogenous structure.

The model actually seems to have a bit of generalizability for new matrices, at least when they are generated via the same function. Emphasizing small and not very stable, but indicates that the model is indeed learning something. Perhaps implies that a first application could be in a specialized setting, where the matrices share a known similar structure! Steps forward:

toveag commented 2 years ago