Add wmt18-newstest-sample-read to indices

obo commented 3 years ago

I just committed a new set of documents to documents/wmt18-newstest-sample-read/

Mohammad, please make sure these documents get included into the relevant indices. Off the top of my head, I know it should be in:

auto-asr-czech-any-domain (*.cs.OS.ogg -> *.cs.OSt)
auto-asr-english-any-domain (*.en.OS.ogg -> *.en.OSt)
auto-mt-en2cs (*.en.OSt -> *.cs.OSt)
auto-mt-cs2en (*.cs.OSt -> *.en.OSt)

Create also these new indices (probably automatic ones):

auto-slt-en2cs (*.en.OS.ogg -> *.cs.OSt)
auto-slt-cs2en (*.cs.OS.ogg -> *.en.OSt) These new indices should include also other documents which allow this type of evaluation, e.g. auto-slt-en2cs should include antrecorp etc.

I use the notation ___ -> ___ to indicate what are the source and what are the reference files. Perhaps we should somehow formally add this information to the indices: which documents use which file suffixes for which purpose.

Please test these updated indices with SLTev!

obo commented 3 years ago

Rishu, this is the documents with Czech read speech, that you should use. I am confused why they are not in the repo.

obo commented 3 years ago

Finally, the push went through and the files are there: https://github.com/ELITR/elitr-testset/tree/master/documents/wmt18-newstest-sample-read/

@Rishu, this is good for SLT evaluation from Czech speech to English text (although it is somewhat artificial speech).

@Mohammad, I know you have handled the suffixes in SLTev somehow. Please test the current behavior of SLTev and check if the usecases -- all the indices mentioned above -- work well.

ELITR / elitr-testset

Add wmt18-newstest-sample-read to indices #11