clamsproject / aapb-evaluations

Collection of evaluation codebases
Apache License 2.0
0 stars 1 forks source link

forced alignment evaluation is done wrong #31

Closed keighrim closed 10 months ago

keighrim commented 10 months ago

Evaluation on gentle-wrapper in https://github.com/clamsproject/aapb-evaluations/pull/29 should be re-done since

  1. the evaluation metrics picked in the implementation is not really measuring forced alignment results, but measuring the degree of total time covered by the hypothesis timeframes. In other words, the implementation was using "detection" metrics, while what we want is segment-by-segment coverage.
  2. the evaluation is done on gentle results on "gold" transcript text, while the evaluation data used in the process was annotated on "non-gold" text (see https://github.com/clamsproject/aapb-annotations/issues/5#issuecomment-1693697601)

I'd like to re-implement the evaluate.py with a new set of hypothesis MMIF files ran on the non-gold text using other metrics available in the pyannote library.