Closed obo closed 2 years ago
Rishu, this is the documents with Czech read speech, that you should use. I am confused why they are not in the repo.
Finally, the push went through and the files are there: https://github.com/ELITR/elitr-testset/tree/master/documents/wmt18-newstest-sample-read/
@Rishu, this is good for SLT evaluation from Czech speech to English text (although it is somewhat artificial speech).
@Mohammad, I know you have handled the suffixes in SLTev somehow. Please test the current behavior of SLTev and check if the usecases -- all the indices mentioned above -- work well.
I just committed a new set of documents to documents/wmt18-newstest-sample-read/
Mohammad, please make sure these documents get included into the relevant indices. Off the top of my head, I know it should be in:
*.cs.OS.ogg -> *.cs.OSt
)*.en.OS.ogg -> *.en.OSt
)*.en.OSt -> *.cs.OSt
)*.cs.OSt -> *.en.OSt
)Create also these new indices (probably automatic ones):
*.en.OS.ogg -> *.cs.OSt
)*.cs.OS.ogg -> *.en.OSt
) These new indices should include also other documents which allow this type of evaluation, e.g. auto-slt-en2cs should include antrecorp etc.I use the notation
___ -> ___
to indicate what are the source and what are the reference files. Perhaps we should somehow formally add this information to the indices: which documents use which file suffixes for which purpose.Please test these updated indices with SLTev!