Open adagio715 opened 11 months ago
@adagio715 did you ever find out the answer for 1 and 2 ?
@adagio715 did you ever find out the answer for 1 and 2 ?
Not really... I guess it was a trade-off between effects, efficiency, model size, etc. Did you have any insight on this? @AlexandreDRFT
@adagio715 did you ever find out the answer for 1 and 2 ?
Not really... I guess it was a trade-off between effects, efficiency, model size, etc. Did you have any insight on this? @AlexandreDRFT
@adagio715 no, actually I'm still struggling to launch an experiment properly because of those params. I have a dataset correctly setup with a bunch of data at 22050 sample rate and the public encodec for 24khz, i can't figure out the correct params for codebooks and n_q to make it work. And the docs and paper are not clear on this. i'm interested if you have any working setup for those !
Hello everyone, I want to train a stereo 48khz encodec model on my own datasets. I have a few questions about the setting of hyper parameters of rvq, causal/non-causal, streamable/non-streamable setup:
n_q=16
for the 48khz model? I want to setn_q=32
for the 48khz model training, will this cause any problem?n_q
set at 4, which is quite small. Why not make it larger? I suspect that a larger n_q could produce better audio quality for audiogen and musicgen?seanet.norm
andseanet.pad_mode
in theconfig/model/encodec/default.yaml
, but I'm not sure if I'm in the correct direction. Can anyone give some hints?rvq.r_dropout
stand for? What does it influence if this parameter is set to true or false, respectively?config/model/encodec/default.yaml
,encodec.causal
andencodec.renormalize
are both set asFalse
. Is it recommened to set them asTrue
if we want to achieve higher audio quality?Thank you very much for your help!