apertium / apertium-python

now you can even use apertium from python
GNU General Public License v3.0
31 stars 27 forks source link

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. #92

Open eagad opened 3 years ago

eagad commented 3 years ago

I am running apertium analyzer from a python script. I get this exception that terminates the script immediately. I am not able to catch it inside python, it seems like it's happenning in c++ and doesn't get handle, how can I handle it?

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. Aborted (core dumped)

To replicate the issue:

import apertium apertium.analyze('en', 'Hi/Hello')

mr-martian commented 3 years ago

It's because you have an unescaped / in your input string.

eagad commented 3 years ago

How would you escape it?

apertium.analyze('en', r'Hi\/Hello')

throws the same exception

mr-martian commented 3 years ago

'Hi\\/Hello'

the escape has to get to the underlying pipe

eagad commented 3 years ago

This still didn't work

apertium.analyze('en', 'Hi\\/Hello')

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. Aborted (core dumped)

Also, is there specific list for characters that need to be escaped?

mr-martian commented 3 years ago

https://wiki.apertium.org/wiki/Apertium_stream_format

ftyers commented 3 years ago

This still didn't work

apertium.analyze('en', 'Hi\\/Hello')

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. Aborted (core dumped)

Also, is there specific list for characters that need to be escaped?

Try adding another backslash ? :)

eagad commented 3 years ago

seems that backslashes are only interpreted as backslashes here... Any ideas other than removing all the forward slashes from the text I am trying to process?

mr-martian commented 3 years ago

Probably what this indicates is that there should be a way to have analyse() invoke deformatters if there isn't already.

mr-martian commented 3 years ago

Also, I think this should actually be on https://github.com/apertium/apertium-python but I for some reason am not able to transfer it there

alexeyev commented 1 year ago

Dear colleagues, thank you for your work.

How do i fix this? Some workaround maybe.

Minimal example:

    ESC_PATTERN = re.compile("([/^$<>*{}\\\\@#+~])", re.UNICODE)
    analyzer = apertium.Analyzer("kir")
    text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."
    text = re.sub(ESC_PATTERN, r"\\\\\1", text.strip())
    print(text)
    analysis: List[LexicalUnit] = analyzer.analyze(text)
    print([lexical_unit.wordform for lexical_unit in analysis])

Output

Кыргызстанда ВИЧ\\/СПИД менен күрөшүүгө акча жетишпейт.
Error: malformed input stream: Found unexpected character / unescaped in stream
: iostream error
['Кыргызстанда', 'ВИЧ', '\\\\/\\\\<sent>']

Thanks in advance.

alexeyev commented 1 year ago

My own workaround is the following

    SPECIAL_CHARACTERS = list("/^$<>*{}\\@#+~")
    REPLACEMENTS = ["shashchar", "capchar", "dollarchar", "lesschar", "morechar", "astchar",
                    "curlyleftchar", "curlyrightchar", "backslashchar", "atchar", "hashchar",
                    "pluschar", "tildechar"]

    assert len(SPECIAL_CHARACTERS) == len(REPLACEMENTS)

    spchar2code = {ch: co for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}
    code2spchar = {co: ch for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}

    analyzer = apertium.Analyzer("kir")
    text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."

    for spc in spchar2code:
        text = text.replace(spc, f" {spchar2code[spc]} ")

    print(text)
    analysis: List[LexicalUnit] = analyzer.analyze(text)
    tokens = [lu.wordform if lu.wordform not in code2spchar else code2spchar[lu.wordform] for lu in analysis]
    print(tokens)

but clearly that's not how the cool kids should do it.

unhammer commented 1 year ago

I would maybe just send it through apertium-destxt, though I don't know if apertium-python has some builtin way or you have to subprocess.communicate yourself

alexeyev commented 1 year ago

Thank you, will give it a try!