claritychallenge / clarity

Clarity Challenge toolkit - software for building Clarity Challenge systems
https://claritychallenge.github.io/clarity
MIT License
130 stars 54 forks source link

Tweak Score for Task1 #272

Closed groadabike closed 1 year ago

groadabike commented 1 year ago

In Task 1, the challenge is to separate the music into the VDBO stems. (Vocal, Drums, Bass, Other). In the input_align function (the alignment step of the Ear Model part of the common base code of HAAQI, HASQI and HASPI), HAAQI uses the reference signal to prune the leading and trail silence of both the reference and the processed signal. This can hide artefacts or residual errors resulting from the separation, resulting in a higher score.

groadabike commented 1 year ago

One option to solve this issue can be to create a new metric that takes both into account; the music and the silence segments.

It is proposed to do the following:

  1. Align reference and processed signal using the same logic as the input_align function but without resampling to 24 KHz.
  2. From the reference signal, detect all silence and music segments.
  3. Split both, reference and processed signal using segmentation from (2)
  4. Concatenate the silence segments into a silence signal and the music segments into a music signal
  5. Compute HAAQI using music signal (A)
  6. Compute RMS of processed silence signal (B)
  7. Overal Score = [(A NA) - (BNB)] / (NA + NB) 7.1. NA = number of samples in A 7.2. NB = number of samples in B