Forced-Alignment-and-Vowel-Extraction / fave-asr

Interface for automated transcription and time alignment of conversational interview data
https://forced-alignment-and-vowel-extraction.github.io/fave-asr/
GNU General Public License v3.0
3 stars 0 forks source link

[Testing] Evaluation metrics for comparing model performance to known standard #12

Open chrisbrickhouse opened 5 months ago

chrisbrickhouse commented 5 months ago

The tests so far are tightly coupled to the implementation rather than the interface. Changes to the internals often cause tests to fail because of small changes to timestamps or chunking (e.g. #5). This is bad because it slows development when tests need to be fixed despite nothing being broken. The main reason for this is that we don't want to push code that results in worse transcriptions than the prior version since that's an obvious regression. The proper way to test for this though would be to do something to code coverage tests and check that the new state is better than the previous one.

chrisbrickhouse commented 2 months ago

https://github.com/nryant/dscore

https://web.archive.org/web/20170119114252/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf

chrisbrickhouse commented 1 month ago

More details on diarization scoring following more research on this (see links in prior comment)