clab / fast_align

Simple, fast unsupervised word aligner
Apache License 2.0
738 stars 159 forks source link

Added flag to import conditional probability table #3

Open dowobeha opened 11 years ago

dowobeha commented 11 years ago

Note: if you run an alignment and export the conditional probability table, then later import that table and use it to align your original parallel corpus, there may be differences in the final alignments. I assume this is due to floating point rounding errors introduced during the import/export cycle. I believe this because when you do the preceding, and then continue by again exporting the table, the newly-exported table exactly matches the originally-exported table. These discrepancies in final alignments could presumably be eliminated if the exported tables stored counts rather than probabilities. However, the ttable class currently clears counts when it performs normalization, so for now I'm going to leave things as they are.