Closed groadabike closed 1 year ago
One option to solve this issue can be to create a new metric that takes both into account; the music and the silence segments.
It is proposed to do the following:
input_align
function but without resampling to 24 KHz.silence signal
and the music segments into a music signal
music signal
(A)silence signal
(B)
In Task 1, the challenge is to separate the music into the VDBO stems. (Vocal, Drums, Bass, Other). In the
input_align
function (the alignment step of the Ear Model part of the common base code of HAAQI, HASQI and HASPI), HAAQI uses the reference signal to prune the leading and trail silence of both the reference and the processed signal. This can hide artefacts or residual errors resulting from the separation, resulting in a higher score.