f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.
https://backpack.pt/
MIT License
555 stars 55 forks source link

Extend BatchGrad to Conv1d an Conv3d #68

Closed ChenAo-Phys closed 3 years ago

ChenAo-Phys commented 4 years ago

Based on the idea of backpack I find a way to extend the application range of BatchGrad to most kinds of pytorch layers without too much effort. https://github.com/ChenAo-Phys/pytorch-Jacobian It's a simple idea that I really hope you can implement into backpack. It would be nice to see this package getting better.

f-dangel commented 4 years ago

Hi,

thanks for your message. Before sharing my thoughts on your idea, I would like to ask you which operations you would like to see being added to BackPACK (there is WIP on transpose convolution: #48).

That said, let me sum up your approach to make sure I understand it correctly: Let's consider the linear mapping (without the bias term): yₖ = ∑ᵢⱼ Mₖᵢⱼ Wᵢ xⱼ.

This idea is quite general and does not require knowledge about how the operation works, other than it being subject to the above parameterization.

I do have a concern about implementation performance: Even though M is binary, it should not scale optimally unless you can exploit its sparsity (I have no experience with torch.sparse and the API still seems to be partly experimental).

A dense version of M tracks all connections, which results in memory complexity proportional to the weight size (D), the input size (C_in), and the output size (C_out), which could be prohibitively large.

Let me know what you think.

ChenAo-Phys commented 4 years ago

Thanks for your reply. Your summary is absolutely correct. Let me first talk about what I'm researching. My current research is to use neural networks to solve some physics systems. Usually the solution needs to be very accurate (above 99.9%). That's the reason why we need to use Jacobian in the training process. The simple total gradient hardly gives satisfactory results. Sometimes we need to study 1D and 3D systems, so Conv1d and Conv3d are also important. That's the reason why I propose my trick to extend the method in backpack. Another feature of my research is that the network is made quite small to prevent the loss of information among layers. Consequently I didn't think too much about the memory complexity. But your concern is quite reasonable. I agree that this method should be expressed in a sparse way to make it work in most networks. Considering the torch.sparse is still under tests, I think it may be better to use some matrix manipulations to achieve similar results. I will probably implement this for my library in the near future. (but not now, because my tutor is asking for my data haha) I will inform you after I have made the change. Thanks very much for the suggestion.

f-dangel commented 4 years ago

Changing the title accordingly, and adding torch.nn.Conv1d and torch.nn.Conv3d to our next release TODOs.

ChenAo-Phys commented 4 years ago

I used a slightly larger network and the previous dense version of my method failed today, so I changed my plan and implement the sparse version. But it still consumes too much memory to record all non-zero indices of tensor M. Anyway, thanks for your previous suggestion again. Also thank you for this package. I will follow your updates.

f-dangel commented 4 years ago

Update: We merged the individual gradient extensions for Conv1d and Conv3d into the development yesterday. You can already try them with backpack(BatchGrad()) by installing from that branch, in case you don't want to wait for the next release.

ChenAo-Phys commented 4 years ago

Thanks for this update! It's very helpful.

f-dangel commented 3 years ago

Now in version 1.2.0. Closing