Open fsimonjetz opened 7 years ago
@fsimonjetz Thank you for your interest in this project.
I would suggest you to use following script to generate your own pair of parallel data, https://github.com/ajinkyakulkarni14/How-I-Extracted-TED-talks-for-parallel-Corpus-
If you are still not been able to extract it, let me know.
@ajinkyakulkarni14, I use https://github.com/ajinkyakulkarni14/How-I-Extracted-TED-talks-for-parallel-Corpus- to extract data for en-ja, but it get error:
Traceback (most recent call last):
File "extractTEDtalk.py", line 25, in
Can you help me to solve the error, please? Thank you!
I am reopening the project and going to update the corpus soon.
The readme says "All data have been processed automatically so that it is not possible to reconstruct the original source texts." I'm considering to use German-Korean data for my PhD project; however, for what I have in mind it would be helpful to have the documents separated. Is this information available? Even stand-off indices would be nice.. I hope you can keep up this project, it looks like a promising resource!