Closed rakuzen25 closed 1 year ago
Forgot to add in the original comment - 48kHz model works fine (i.e. if I do model = EncodecModel.encodec_model_48khz()
).
I think I've found the problem. wav = wav.unsqueeze(0)
should happen after the convert_audio
step because convert_audio
assumes wav[0]
to be the channel count, when in fact after unsqueezing it's the "sample count".
Should I make a PR for this?
Exactly sorry about that, I fixed the convert_audio
fonction to be more robust ! I also updated the README to swap the order of the two lines for people who would be using the older version. Thanks for reporting !!
No worries, thanks for the fix!
🐛 Bug Report
Following the "Extracting discrete representations" section in README, I tried to extract the encoded embedding myself. However, running the exact code snippet gave me an error:
RuntimeError: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 144006] to have 1 channels, but got 2 channels instead
.To Reproduce
where
test.wav
is any WAV file. I tried with one on the sample page.Expected behavior
I should be able to get the representation in
[B, n_q, T]
as described in the code itself.Actual Behavior
Full traceback:
Your Environment