fredpell1 / YADLL

Yet Another Deep Learning Library
MIT License
5 stars 2 forks source link

decouple numpy from tensor #28

Open fredpell1 opened 11 months ago

fredpell1 commented 11 months ago

If we want to add more backends, we need to decouple numpy from the tensor class. I am thinking of having an IR that then maps to implementation in each backend. The tensor would then just call the IR and depending on configs it would use the appropriate backend

Transurgeon commented 11 months ago

I would love to help working on that, refactoring code is always very important and design decisions can impact a lot down the line. Could you expand on your idea of having an intermediate representation (IR)? We could have an abstract class called TensorRepresentation, and then have subclasses for each backend which will define abstract methods such as get_type and other common operations. (I am stealing this idea from cvxpy btw... but in my experience it's hard to have very nice modularity, most of the time methods need to be reimplemented completely)

fredpell1 commented 11 months ago

yeah thats pretty much what I had in mind! The IR would be the declaration of the operations and then the backends would implement them. For example, if you look at expand right now, it uses np.broadcast_to:

def expand(self, dim: tuple[int]) -> Tensor:
      output = Tensor(
          np.broadcast_to(self.data, dim), True, (self,), "expand", self.name
      )

      def _backward():
          self.grad += output.grad.sum(
              axis=shape_to_axis(self.shape, output.shape), keepdims=True
          ).reshape(self.shape)

      output._backward = _backward
      return output

Thats too coupled with numpy. What I want would be something like:

def expand(self, dim):
  output = Tensor( operation.expand(self.data, ...)
  ...

And then each backend would need to implement their own expand. In terms of what those basic operations should be, I think we should come up with a relatively small set of them, the obvious one would be: add, mul, negation, and so on. In terms of operations that only move data, but dont do any compute, e.g. flatten, transpose, etc. We should try to implement as many of them as possible with one or two basic movement ops like permute. The smaller the IR, the easier it is to support different backends. Once Im done with #19 and you're done with #29, we can start working on this

fredpell1 commented 11 months ago

right now from numpy we use: getitem setitem add sum reshape mul matmul pow transpose permute pad reshape broadcast_to view_as_windows (from skimage but uses numpy in its implementation) np.lib.stride_tricks.as_strided expand_dims take_along_axis squeeze unravel_index put_along_axis exp log

fredpell1 commented 11 months ago

The IR should generalize these operations and the backend should provide both the data and the implementation of these operations