ELITR / elitr-testset

ELITR collection of test sets, for ASR, MT and SLT
3 stars 12 forks source link

Manual sentence segmentation and casing for Czech ASR test files #9

Closed obo closed 2 years ago

obo commented 3 years ago

The Czech ASR transcripts that Jonas used were not properly cased and segmented. This concerns all the subdirectories of: https://github.com/ELITR/elitr-testset/tree/master/documents/czech-asr

Please:

  1. Rename OSt files to e.g. "raw-revised-transcript" (@pyRis)
  2. Process the files with our segmenter (@pyRis), saving them as OSt.
  3. Have the OSt files manually revised by 'exotic annotators' (@srdecny), they should directly save the outputs here, they can even edit in place. The correction will primarily affect the segmentation and casing, because the words themselves are probably already revised from the past.
pyRis commented 3 years ago

@srdecny This is the issue that we are talking about right now on the call.

pyRis commented 2 years ago

This was done long back, we just didn't close the issue.