I sometimes ran into this one but I would like to know what you think about it:
encoder=MelE1d( # The encoder used, in this case a mel-spectrogram encoder
in_channels=in_channels,
channels=512,
multipliers=[1, 1],
factors=[2],
num_blocks=[12],
out_channels=32,
mel_channels=80,
mel_sample_rate=48000,
mel_normalize_log=True,
bottleneck=TanhBottleneck(),
),
I believe it extracts a lot of features, thus putting a strain on the GPU.
Hi!
I am very curious about the future work part of the paper.
There were a few suggestions in the paper. Let me talk about two.
1. Use perceptual losses.
You have just merged a PR that allows for loss customization. Which perceptual loss did you have in mind when you wrote the suggestion?
2. Using mel spectrograms instead of magnitude spectrograms as input.
dmae1d-ATC64-v2 Uses the magnitude spectrogram.
What would be a good mel feature extractor?
I sometimes ran into this one but I would like to know what you think about it:
I believe it extracts a lot of features, thus putting a strain on the GPU.
Curious what you have to say about 1 and 2.
Cheers, Tristan