Closed evanmiltenburg closed 8 years ago
So format_sequence
should probably be avoided altogether for prediction mode. get_generation_data_by_split
looks like magic to me (I don't understand what's going on exactly). But I guess this is the required rewrite you were talking about in https://github.com/elliottd/GroundedTranslation/issues/15#issuecomment-196314081.
EDIT: I understand it a little better now!
Ok, I just added a hack to set self.data_gen.max_seq_length
to 30 (random number). That works!
This is also related to completely rethinking the data_generator. Your quick-fix will work for the short-term but we still need a long-term rethink of how data_generator should work.
After using c27b2c1ebfe1f9a1bf8985b66ebb03852cf59c38 to fix #15, I get this error. Generation breaks down because there's no reference length.