Open Dhrubauk opened 4 years ago
Are you running this from an interactive session (e.g. ipython)? If so, relative import (from .blah import blah
) doesn't work. This is an issue with Python import in general (https://realpython.com/absolute-vs-relative-python-imports/).
If you just need to run this file from an interactive session, you can remove the dot (so from complex_utils import complex_mul
). But in general we mostly run the training scripts that then call butterfly_multiply
.
Hi tridao thank you for the reply, a quick question may I know, in which part of the code segment the butterfly matrix is generated . Or do you have any flowchart of slides related to this. Your help in this concern is highly appreciated
Regards Dhruba
The butterfly matrix isn't explicitly generated (as that would use O(n^2) space and takes O(n^2) time to multiply by a vector). Instead, we provide a Butterfly
class that stores O(n log n) parameters and implements O(n log n) multiplication algorithm (https://github.com/HazyResearch/learning-circuits/blob/master/butterfly/butterfly.py#L76).
In butterfly connection, each node is connected to two other nodes, so why it is not 2n log n? Also can you please tell me where in the code you multiply the weight vector and the input features? and where you update the vector of weights? Thanks, Dhruba
You're right, it's 2n log n parameters. Remember to pass in tied_weight=False
, like Butterfly(1024, 1024, tied_weight=False)
.
The version with tied weights was used in an earlier paper, where we only had 4n parameters.
The O(n log n) multiplication algorithm is implemented in 3 times in 3 different languages:
The weights are stored in the attribute twiddle
of the Butterfly
class, which is just a Pytorch tensor. To update it, you pass it to a Pytorch optimizer just like any other trainable weight.
If you're asking about how to compute the gradient wrt to the weight:
Butterfly
class, you do not need to know how backward is implemented.
In any case, the algorithm is simple: treat the butterfly matrix as a product of log_n matrices, which you can think of as log_n layers. Then run back-propagation/autodiff on it. This amounts to a few for loops. The core implementation is about 20 lines of C++ (https://github.com/HazyResearch/learning-circuits/blob/cnn/butterfly/factor_multiply/factor_multiply.cpp#L733).Okay great. I am kind of confused, how did you calculate loss or which loss function you actually used?
It can be used with any differentiable loss function. In back-propagration/reverse mode autodiff, for each layer, we are given the incoming error/grad from layers above, and use that to compute gradient wrt to the parameters of the current layer. That's exactly the interface of a custom Pytorch layer (https://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html): in the backward, given the gradient wrt to the output of the layer, compute the gradient wrt to the parameters of the layer.
Hi tridao, I use windows OS and jupyter notebook. i am having trouble with ray. Can i run your code with any modification? or do i have to change my OS?
We only use ray to run the experiments (parallel runs and hyperparameter optimization).
The butterfly class (in the butterfly
directory) doesn't require ray.
Hello, I am getting this error when i ran "butterfly_multiply" module. ModuleNotFoundError Traceback (most recent call last)