Closed louismartin closed 4 years ago
We used 2000 (valid) for training the SMT model. We weren't sure how much training data would be enough, as there was little previous work at the time and SARI was new. Each experiment would take 2~3 days to complete -- so we didn't experiment much with different sizes of training data. Looking backward, it is probably okay to use only half of the data for training.
Ok thanks!
Hello @cocoxu , Just out of curiosity, is there a reason that turkcorpus is split in 2000 (valid) / 359 (test) samples and not 50%/50% for example? Thank you!