THUNLP-MT / THUMT

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group
BSD 3-Clause "New" or "Revised" License
703 stars 197 forks source link

SST with monolingual target data only? #41

Closed judembo closed 6 years ago

judembo commented 6 years ago

I'm trying to use the Semi-Supervised Training on the theano branch, but only with monolingual data from the target language (and no monolingual data from the source language). They do the same in Cheng et al. (2016), but it seems like THUMT requires monolingual data from both languages. Am I missing something or is there an easy way to do this without changing a lot of code? Thank you!

Glaceon31 commented 6 years ago

Unfortunately the implementation only supports using both source and target monolingual data. you can use dummy source monolingual data and remove everything related to trans_x (line 132) in binmt.py. That will ignore the source monolingual data.

judembo commented 6 years ago

Thank you, looks like I got it running.