After some months away from this repo, I decided to give it a final shot at a working implementation.
As many issues have pointed out, the ggml submodule pointed to an old, no-longer-referenced commit hash of ggml. Furthermore, the forward pass was not working, and several bugs still needed to be fixed.
This pull request introduces several key changes to the repository:
[x] Simplifying the dependency graph: bark -> encodec -> ggml. This means there is only one git submodule in the repo pointing to Encodec. Previously, bark depended on two different versions (from distinct commit hashes) of ggml (one for Bark, one for Encodec). This has been removed and will fix issues faced by users who mentioned they could not clone the repo.
[x] Fix a sneaky mistake in the forward pass of the coarse model (n_steps -> step_idx), which truncates the audio output.
[x] Fix a mistake in the forward pass of the fine model (two arguments had their order inverted in a function call, nn was passed as n_threads and vice versa). This solves the poor audio output quality.
[x] Convert all the Bark weights in a single file instead of having 3 separate files for each GPT model.
[x] Adapt the quantization script to read from a single file instead of 3.
Hi All,
After some months away from this repo, I decided to give it a final shot at a working implementation.
As many issues have pointed out, the ggml submodule pointed to an old, no-longer-referenced commit hash of ggml. Furthermore, the forward pass was not working, and several bugs still needed to be fixed.
This pull request introduces several key changes to the repository:
bark
->encodec
->ggml
. This means there is only one git submodule in the repo pointing to Encodec. Previously, bark depended on two different versions (from distinct commit hashes) of ggml (one for Bark, one for Encodec). This has been removed and will fix issues faced by users who mentioned they could not clone the repo.n_steps
->step_idx
), which truncates the audio output.nn
was passed asn_threads
and vice versa). This solves the poor audio output quality.