apertium / lttoolbox

Finite state compiler, processor and helper tools used by apertium
http://wiki.apertium.org/wiki/Lttoolbox
GNU General Public License v2.0
18 stars 22 forks source link

Apertium is not translating a Word document #122

Closed Elivica closed 2 years ago

Elivica commented 3 years ago

Hi. I´m trying to translate a Word document (Office95 version) from Spanish into Valencian and it only works on the first paragraphs, the rest of the document is left untransalated. It works perfectly when transalated from Spanish into English, though. T6 Algebra castellano.docx

TinoDidriksen commented 3 years ago

Doesn't appear to be a Transfuse issue (tf-clean outputs a valid document), so could be https://github.com/TinoDidriksen/cg3/pull/75

TinoDidriksen commented 3 years ago

Actually an lttoolbox issue. It dies in the last step lt-proc -p 'spa-cat.autopgen.bin' with error basic_string::substr: __pos (which is 2) > this->size() (which is 0)

xavivars commented 3 years ago

Hey @TinoDidriksen, this is an example of two documents, one of which will do exactly the same thing as described here (the one called error_apertium.docx), while the other one will work (ok_apertium.docx).

Here you have transfuse's output for both documents: error_apertium.docx

[transfuse:\/tmp\/transfuse-ApoXwS8-DBk]

[tf-block:1-8cHcOQ]

[[t:text:Tbd7QQ]]El gato duerme.[[/]].[]

and here for ok_apertium.docx

[transfuse:\/tmp\/transfuse-ncpbDQMtYnI]

[tf-block:1-E_jkSw]

El gato duerme..[]

Is there anything else that would make it diagnose/fix easier?

TinoDidriksen commented 3 years ago

So likely something to do with empty or all-whitespace segments. Still an lttoolbox issue, hence why I unassigned myself. Pinging @khannatanmai

unhammer commented 3 years ago

Was this the one that turned out to be in lt-proc -x?

xavivars commented 3 years ago

Yes. I still think it's probably worth to try to fix it, or (as soon as I move apertium-cat related pairs out of it) deprecate that functionality. Because having a not-always-working module in the pipeline seems wrong.

unhammer commented 2 years ago

Perhaps #144 fixes this

mr-martian commented 2 years ago

I am unable to reproduce this based on @xavivars's example, so it might have been fixed in #149, but I also failed to reproduce it on aa1cf3a506b672d82db52d0f57698d1b5ccd389b, so maybe something else fixed (or -cat has changed).

xavivars commented 2 years ago

-cat moved away from using intergen some of months ago, when we moved that to preferences using cg

mr-martian commented 2 years ago

The original error was due to an invalid index being passed to substr(). The relevant call was removed in one of the rewrites, so I am marking this as completed.

Elivica commented 2 years ago

Thank you for all the work you all have developed during such a long period!

El sáb., 23 jul. 2022 16:29, Daniel Swanson @.***> escribió:

Closed #122 https://github.com/apertium/lttoolbox/issues/122 as completed.

— Reply to this email directly, view it on GitHub https://github.com/apertium/lttoolbox/issues/122#event-7050349961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUFXEDW4PLBFROMXWG25NP3VVP6V7ANCNFSM5E2RTT6Q . You are receiving this because you authored the thread.Message ID: @.***>