Closed NickAnastasoff closed 1 year ago
This doesn't seem to be an issue with my repository. This repository exclusively extracts semantics.
Also, i was not able to reproduce the issue, your code worked fine on my side.
pip install -f encodec
)ffmpeg -i 0520.wav audio.wav
)Thank you so much for your reply! sadly it still didn't work for me. How did you generate the npz? this is what I wrote, so its probably the issue: ``` from encodec import EncodecModel from encodec.utils import convert_audio
import torchaudio import torch
""" Instantiate a pretrained EnCodec model
model = EncodecModel.encodec_model_24khz()
The number of codebooks used will be determined bythe bandwidth selected.
E.g. for a bandwidth of 6kbps, n_q = 8
codebooks are used.
Supported bandwidths are 1.5kbps (n_q = 2), 3 kbps (n_q = 4), 6 kbps (n_q = 8) and 12 kbps (n_q =16) and 24kbps (n_q=32).
For the 48 kHz model, only 3, 6, 12, and 24 kbps are supported. The number
of codebooks for each is half that of the 24 kHz model as the frame rate is twice as much.
model.set_target_bandwidth(6.0)"""
"""Load and pre-process the audio waveform""" wav, sr = torchaudio.load("0520.wav") wav = convert_audio(wav, sr, model.sample_rate, model.channels) wav = wav.unsqueeze(0)
"""Extract discrete codes from EnCodec""" with torch.no_grad(): encoded_frames = model.encode(wav) codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]
fine_prompt = codes <- is this the issue?
coarse = fine_prompt[:2, :]
import numpy
numpy.savez(semantic_prompt=semantic_tokens, fine_prompt=fine_prompt, coarse_prompt=coarse, file="pleasework.npz")```
You should probably wrap your code in code blocks (``` around your text) in the future.
I ran that code, and it created the file just fine. Can you send me the wav you're using? I think your input wav is a bit broken, and encodec can't load it.
Again, this issue is not really related to my repository here. But it's probably your wav file.
Sorry for the wait!
I put my code into a jupyter notebook, and I still got the same problem! Ill link that, and my audio.wav is in it.
Thanks so much for your time!
You can't upload an audio file like that to google colab, since it's storage is not persistent.
Check if you can clone the file in here
I found the problem! You were right! I shortened my wav to under 10 seconds, and its working, thank you so much! btw, It might be helpful for others if you put that google colab I had above in the readme https://colab.research.google.com/drive/1IA3c_R859nANerMARazCSrjc2UD3ws8A?usp=sharing
Oh, actually, i noticed this today.
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]
should be
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze() # [B, n_q, T]
That makes sense! The code I write is usually the problem 🤣
Thanks so much!
That makes sense! The code I write is usually the problem 🤣
Thanks so much!
That was actually something i was missing in the old version, plus the encodec example doesn't have it. So that's on me.
For anyone trying to find an answer - codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]
should be
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze() # [B, n_q, T]
And shorten audio file to under 10 seconds
You don't need to shorten the audio, but it's recommended to shorten it to 15 or 20 seconds, going beyond 15 seconds will result in less audio for it to clone from.
make sure you take the audio from the end, not the start.
I have tried to create an npz, although I think I have done something wrong. I have gotten bark running up until generate_coarse:
Exception has occurred: AssertionError exception: no description File "/Users/nickanastasoff/Desktop/bark test/bark/bark/generation.py", line 573, in generate_coarse round(x_coarse_history.shape[-1] / len(x_semantic_history), 1) File "/Users/nickanastasoff/Desktop/bark test/bark/bark/api.py", line 54, in semantic_to_waveform coarse_tokens = generate_coarse( File "/Users/nickanastasoff/Desktop/bark test/bark/bark/api.py", line 113, in generate_audio out = semantic_to_waveform(
customHuburt.txt This is what I used to make the npz. Im pretty sure the issue is with
fine_prompts = codes
but im not sure what else to do.