Closed Warvito closed 1 year ago
As we discussed, the choice of the architecture is something to be discussed yet. think the main options are the implementation that we are currently using from lucidrains (https://github.com/lucidrains/x-transformers and https://github.com/lucidrains/performer-pytorch) and the current implementation based on MONAI core, with key components using xFormers (https://github.com/facebookresearch/xformers).
From one side, we are quite familiar with the lucidrains implementation, and in the end, it would be his transformer with a wrapper class. On the other side, using the current blocks from monai +xformer could be more flexible solution.
I will try to investigate further how the use of xFormers would look like (one problem might be that xFormers need pytorch >=1.12).
Maybe something like this for a self attention block https://gist.github.com/Warvito/5c3363ddbf3941150c2511b27b75d701 .
Need yet check how it would look like when using masked attention for the autoregressive model and how it performs regarding memory and speed.
@danieltudosiu which features from x-transformer were more useful for your models?
As we discussed, the choice of the architecture is something to be discussed yet. think the main options are the implementation that we are currently using from lucidrains (https://github.com/lucidrains/x-transformers and https://github.com/lucidrains/performer-pytorch) and the current implementation based on MONAI core, with key components using xFormers (https://github.com/facebookresearch/xformers).
From one side, we are quite familiar with the lucidrains implementation, and in the end, it would be his transformer with a wrapper class. On the other side, using the current blocks from monai +xformer could be more flexible solution.
I will try to investigate further how the use of xFormers would look like (one problem might be that xFormers need pytorch >=1.12).
I would argue, as per our last meeting, that we can go forward with a common interface based on the VQ-VAE + Transforner common codebase from KCL for the time being as those have been extensively used in KCL's publications.
Later, we can create another PR that might have breaking changes, adding building blocks and/or other models for more flexibility.
This approach would provide the required building blocks for us to offer the users the full pipelines while allowing us a later date to increase the flexibility of the package.
@danieltudosiu which features from x-transformer were more useful for your models?
The ones that I used from the X-Transfomer are the following:
I would highly advise against reimplementing anything for the short-term future. It will only increase the amount of code we need to maintain.
@Warvito I was about to work on this issue today, how do you want me to go forward with it? Or should I put it on hold for now?
As we discussed, the choice of the architecture is something to be discussed yet. think the main options are the implementation that we are currently using from lucidrains (https://github.com/lucidrains/x-transformers and https://github.com/lucidrains/performer-pytorch) and the current implementation based on MONAI core, with key components using xFormers (https://github.com/facebookresearch/xformers). From one side, we are quite familiar with the lucidrains implementation, and in the end, it would be his transformer with a wrapper class. On the other side, using the current blocks from monai +xformer could be more flexible solution. I will try to investigate further how the use of xFormers would look like (one problem might be that xFormers need pytorch >=1.12).
I would argue, as per our last meeting, that we can go forward with a common interface based on the VQ-VAE + Transforner common codebase from KCL for the time being as those have been extensively used in KCL's publications.
Later, we can create another PR that might have breaking changes, adding building blocks and/or other models for more flexibility.
This approach would provide the required building blocks for us to offer the users the full pipelines while allowing us a later date to increase the flexibility of the package.
Sounds good. Let's start using the implementation that we are familiar (Lucidrains) and then we continue to discuss moving to something more flexible.
Then should I go forward with the Interface that we already have been using in the KCL codebase?
Yes, I think it is a good starting point
Add transformer network and components to make it compatible with VQ-VAE network. Create the components necessary to generate samples and likelihood of inputted data from the model. Add the relevant unit tests and documentation.