Open desh2608 opened 3 years ago
Are JER/clustering metrics still of interest? I'd be up for adding them if I know the PRs would get accepted.
Hi Neville! Yeah, that would be awesome. JER is top-most on the list, but I can imagine people would be interested in other metrics as well.
(@popcornell and I want to switch from dscore to spyder in CHiME-7 DASR, but it is blocked by JER not being implemented yet.)
Ok, I can add this to the TODO list. I'm in the process of rewriting dscore
to eliminate the md-eval
dependency and output more detailed reporting. The initial version is based on pyannote.metrics
, but between the penalty of Python being an interpreted language and the repeated calls to uemify
, it's not particularly quick. So, it's in my interest to get faster implementations of the various metrics and I'd rather contribute to an existing project if possible.
Cool! Your contributions would be very welcome. In my benchmarking, I found pyannote.metrics to be an order of magnitude slower than md-eval.pl --- pyannote is a great tool overall, just not suitable for DER evaluation :)
I'm sure spyder would benefit immensely from your expertise. Please use this thread for any questions/discussions once you get around to implementing the metrics.
That sounds about right. When I benchmarked on the DIHARD III eval (full) condition, just the DER computation (omitting IO and building the Annotation
/Timeline
instances in memory) averaged over 13 seconds; cf. to 3.5 seconds for running md-eval
. Most of this comes from the call to IdentificationErrorRate.uemify that constructs the equivalent of your get_eval_regions. Specifically, this block, which accounts for 10 seconds of that run time.
I've been updating dscore
off-and-on for the past week for an LDC internal project and want to finish that work first, but will look into implementing JER in spy-der
after. I think it should be relatively straightforward.
Following metrics (from pyannote and dscore) may be implemented: