Anjok07 / ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.
MIT License
16.41k stars 1.24k forks source link

BS Roformer skips audio when Overlap is set to a number over 8 #1371

Open K97i opened 1 month ago

K97i commented 1 month ago

The Issue

When the Overlap option is set to something above 8, it makes BS-Roformer skip audio periodically (I only tested with model_bs_roformer_ep_317_sdr_12.9755.ckpt). Whether this is a problem with UVR (with BS-Roformer sharing model settings with MDX-NET) or the model itself (or both), I don't know.

The Test

Using the April 14, 2024 patch, I separated all of the tracks with the segment size set to 1024. Screenshot 2024-05-24 191725

The white track is a baseline separation using the MDX23C, overlap set to 8. The red track used BS-Roformer (model_bs_roformer_ep_317_sdr_12.9755.ckpt) with the overlap set to 8. The green track used BS-Roformer (model_bs_roformer_ep_317_sdr_12.9755.ckpt) with the overlap set to 9. The blue track used BS-Roformer (model_bs_roformer_ep_317_sdr_12.9755.ckpt) with the overlap set to 10.

It seems that BS-Roformer can only process 8 seconds of audio per chunk, then discards the rest of the audio fed into it, given that, in one cycle, the length with audio plus the length without audio is equal to the Overlap setting in seconds.

Solutions?

To prevent this, cap the overlap to 8 when the chosen model is a BS-Roformer model, so that users can't set it to something above 8. I don't know if this also affects other Roformer models, as I haven't tested it deeply.