Closed Kinyugo closed 1 year ago
What parts do you want to use with 2D tensors? The model is a 1D UNet, hence it uses 1D convolutions, it's not directly applicable to 2D as long as you don't change the entire architecture. You could stack the channels of spectrograms and use those in the UNet1d though.
I am interested in using the diffusion
part only, i.e: schedulers, samplers, inpainting
e.t.c. I think those parts could be made generic without altering the working of the rest of the code.
Yes, that's a good idea. I have to update the diffusion structure a bit in the following days (i.e. make it more adaptable to different diffusion types and samplers), and I will consider changing that as well.
That will be awesome 🔥 🔥
To follow up on this, v-diffusion + sampler (the ones I found to work the best) are now generic to any dimension. I temporarily removed the other k-diffusion ones as I wasn't getting amazing results with them. Do you need those as well?
As a bonus, the U-Net from a-unet
is also generic to any dimension, just in case :)
Thanks. That's amazing. I will be trying v-diffusion in a follow up, currently I went with the diffusers implementation of DDIM (see project here). However, DDIM is needs more iterations compared to the recent techniques. For the U-Net, the project implements a custom u-net tailored for spectrograms. You could check to see if it's something you might consider adding to this project. Good Job 👏🏿
Will close this as spectrograms are supported. Feel free to reopen if you think there's something missing
I am trying to use the package to work with spectrograms, but I have encountered a problem. Some of the operations in the package are only designed to work with 3-d tensors, which limits their usability.
Request
I would like to request a change to make these operations more generic, so that they can be used with spectrograms (or any other data that may not necessarily be 3-d tensors). This would enable more users to use the package for a wider range of applications, and improve the overall usability of the package.
Examples
To illustrate the issue and the desired change, I have provided some examples below.
Sequential mask generation
The
sequential_mask
operation generates a boolean mask for a tensor. The original version of the operation is shown below:To make this operation more generic, we could change the third dimension (dim=2) to the last dimension (dim=-1). This would allow the operation to work with any tensor, regardless of its shape. The revised version of the operation would look like this:
I am happy to contribute, to address these issues.