clab / fast_align

Simple, fast unsupervised word aligner
Apache License 2.0
738 stars 159 forks source link

Crash with larger corpus #9

Open andidol opened 9 years ago

andidol commented 9 years ago

I have an English->German corpus with ~7GB built from (Pattr, Europarl and News Comments), max sentence length of 80 chars. fast_align crashes in iteration 1.

ITERATION 1

.................................................. [50000] .................................................. [100000] .................................................. [150000] .................................................. [200000] .................................................. [250000]

... .................................................. [5650000] Killed: 9

It is a memory problem. Process gets killed because of the memory consumption. I avoided the problem by using a machine with a lot of memory.