apertium / lttoolbox

Finite state compiler, processor and helper tools used by apertium
http://wiki.apertium.org/wiki/Lttoolbox
GNU General Public License v2.0
18 stars 22 forks source link

Weighted segfault #27

Closed TinoDidriksen closed 6 years ago

TinoDidriksen commented 6 years ago

There is a massive segfault / memory leak somewhere in the weight code. After upgrading to it, translations started randomly overloading, with some part of lttoolbox eating the APy machine's whole 32 + 64 GB RAM in seconds and then dying. Haven't taken the time to isolate it yet - for now, I've rolled back the install.

[29857716.421446] lt-proc[30665]: segfault at 824100 ip 00007f728d000bbc sp 00007fffe8c3f6b0 error 4 in liblttoolbox3-3.4.so.1.0.0[7f728cfa8000+6c000]

(ping @Techievena)

Techievena commented 6 years ago

Hi Tino Didriksen

I think it might be due to this change https://github.com/apertium/lttoolbox/commit/e2135c7fccadb19becff2aeb637920d1737c7fd5#diff-415c5fbb5f00468526da3b5270538172L60 . stod might be causing segfault for wide strings.

On Sun 5 Aug, 2018, 1:52 PM Tino Didriksen, notifications@github.com wrote:

There is a massive segfault / memory leak somewhere in the weight code. After upgrading to it, translations started randomly overloading, with some part of lttoolbox eating the APy machine's whole 32 + 64 GB RAM in seconds and then dying. Haven't taken the time to isolate it yet - for now, I've rolled back the install.

[29857716.421446] lt-proc[30665]: segfault at 824100 ip 00007f728d000bbc sp 00007fffe8c3f6b0 error 4 in liblttoolbox3-3.4.so.1.0.0[7f728cfa8000+6c000]

(ping @Techievena https://github.com/Techievena)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apertium/lttoolbox/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/AQeItr-RGhhDPUIA7IfTZpB-uloKcwBEks5uNqtPgaJpZM4VvSdv .

[image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality6& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality6& 08/05/18, 3:56:20 PM

TinoDidriksen commented 6 years ago

No, stod() is well-defined for wstring. Also, the segfault happens at runtime during translation for old already built pairs. New weights are not part of the run.

TinoDidriksen commented 6 years ago

Given an unweighted pre-compiled bin, a segfault happens in https://github.com/apertium/lttoolbox/blob/master/lttoolbox/trans_exe.cc#L120 because https://github.com/apertium/lttoolbox/blob/master/lttoolbox/fst_processor.cc#L830 unconditionally tells it to read weights even if there are no weights in the input file.

Well @Techievena, looks like I do need your help, 'cause I don't know how lttoolbox is supposed to detect that the input file is an old unweighted bin. I assume you store a flag in the new files that won't be present in the old, but can't find that in the code.

Techievena commented 6 years ago

No @TinoDidriksen I am sorry there is no flag as such. I didn't know we have to take pre-compiled binary files as input. I thought we have to first compile the dictionary files every time, so default value i.e. 0.0000 will be written to the binary files even if its unweighted.

TinoDidriksen commented 6 years ago

Ok then, we need to add such a flag. We absolutely cannot require that every deployment recompiles their data files - lttoolbox must be able to load both old and new files.