ELITR / elitr-testset

ELITR collection of test sets, for ASR, MT and SLT
3 stars 12 forks source link

wrong indices and link files #15

Closed mohammad2928 closed 3 years ago

mohammad2928 commented 3 years ago

Hi, Some of the indices are wrong and contains the path of files that there is not in in the repo. e.g.
... index: auto-mt-es2en line: elitr-testset/documents/confidential/intercorp/es2en/aligned.en

index: ondrej-iwslt2020-testset line: #include iwslt2020-antrecorp ...

Also, link files refer to a path in the UFAL clusters, we need to find an alternative way for handling link files (big files), and please use ".url" or ".link" as a postfix in the like files. e.g. ... file: elitr-testset/documents/confidential/amalach-sample-interview line: /net/data/ELITR/data-sources/elitr-testset-confidential-files/amalach-sample-interview ...

I assign this issue to Ondrej and Daniel because they are owners of these mention files.

mzilinec commented 3 years ago

Hello, I'm using this index as discussed yesterday, but today it didn't get copied to my directory and I'm stuck. I still have the files from yesterday but I'm not sure how I should obtain the OStt files @mohammad2928 . Are the "OStt" files created or obtained when I run SLTev -g auto-mt-en2cs --outdir eval-run-4-feb-2021? I'm just missing these right now, I have the other files. Thanks.

mohammad2928 commented 3 years ago

Hi, sorry for the delay, Please use the new SLTev version (> 1.0.4). If the OStt did not exist, the OSt file is used instead of the OStt file in the evaluation phase (it would be created in the memory).

pyRis commented 3 years ago

Hnadling big files outside UFAL cluster is now resolved by @obo and about commented out lines i.e. lines with "#" at beginning, it's better to parse files with an expectation to have commented line here and there. I usually use cat file_name | grep -Ev "^#" while reading the content of an index file.