mark_unknown doesn't work

dimitarsh1 commented 4 years ago

Hi, when translating, setting mark_unknown to False, does not impact the translation at all, thus always placing a "*" in front of unknown and "@", "#" and "/" in front of errors.

Furthermore, in the translate function in __init__.py it seem that the mark_unknown argument does not do anything; it is not invoked or used anywhere.

Any idea how to fix this?

Thanks in advance, Dimitar

sushain97 commented 4 years ago

Yeah, this is a bug.

dimitarsh1 commented 4 years ago

Yes, I see. Thank you.

On Wed, 6 May 2020, 07:01 Sushain Cherivirala, notifications@github.com wrote:

Yeah, this is a bug.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/apertium/apertium-python/issues/82#issuecomment-624457640, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADYLMETW33LIJGA2NREENLTRQD4NDANCNFSM4M2ER4UQ .

dimitarsh1 commented 4 years ago

OK,

so, I took some time to get what's going on in the code and what calls what. In the lttoolbox there is no support for unknown.

However, I did a quick fix the following way: In utils.py in handle_command_with_wrapper:

` .... text = end.decode() input_file.write(text) input_file.close()

if 'lt-proc' == command[0]:
    fst = initialized_wrappers[command]
    lt_proc_command, dictionary_path, arg = command[:-1], command[-1], command[1]
    fst.lt_proc(lt_proc_command, input_file.name, output_file.name)

`

replaced with ` text = end.decode() input_file.write(text) input_file.close()

-->if 'lt-proc' == command[0] and "-n" != command[1]: fst = initialized_wrappers[command] lt_proc_command, dictionary_path, arg = command[:-1], command[-1], command[1] fst.lt_proc(lt_proc_command, input_file.name, output_file.name) `

Then changed also parse_mode_file:

from def parse_mode_file(mode_path: str) -> List[List[str]]: """ .... cmd = cmd.replace('$2', '').replace('$1', '-g') .... to def parse_mode_file(mode_path: str, mark_unknown: bool = True) -> List[List[str]]: .... if not mark_unknown: cmd = cmd.replace('$2', '').replace('$1', '-n') else: cmd = cmd.replace('$2', '').replace('$1', '-g') ....

Then in the translate/__init__.py I changed _get_commands:

`def _get_commands(self, l1: str, l2: str, mark_unknown: bool = True) -> List[List[str]]: """ Args: l1 (str) l2 (str)

    Returns:
        List[List[str]]
    """
    if (l1, l2) not in self.translation_cmds:
        mode_path = apertium.pairs['%s-%s' % (l1, l2)]
        self.translation_cmds[(l1, l2)] = parse_mode_file(mode_path, mark_unknown)
    return self.translation_cmds[(l1, l2)]

`

and Translator.translate: cmds = list(self._get_commands(l1, l2)) to cmds = list(self._get_commands(l1, l2, mark_unknown))

Also, the default value for mark_unknown everywhere is set to "True".

Don't know if that's a good fix - haven't had the time to delve into lttoolbox and FST, but it seems to work for me. System is ubuntu 18.04; python is 3.6.5

Kind regards, Dimitar

sushain97 commented 4 years ago

Could you send a proper diff/patch or PR? Your comment is really hard to read.

ygorg commented 1 year ago

Reformatting the comment from @dimitarsh1 : In the lttoolbox there is no support for unknown.

In https://github.com/apertium/apertium-python/blob/81b10e509f65fcf1c77a0c2080f398897d3629c2/apertium/utils.py#L110 in handle_command_with_wrapper:

if 'lt-proc' == command[0] and "-n" != command[1]:

Then changed also parse_mode_file:

https://github.com/apertium/apertium-python/blob/81b10e509f65fcf1c77a0c2080f398897d3629c2/apertium/utils.py#L195

to

# Add parameter mark_unknown
def parse_mode_file(mode_path: str, mark_unknown: bool = True) -> List[List[str]]

    if not mark_unknown:
        cmd = cmd.replace('$2', '').replace('$1', '-n')
    else:
        cmd = cmd.replace('$2', '').replace('$1', '-g')

Then in the translate/__init__.py I changed _get_commands:

https://github.com/apertium/apertium-python/blob/81b10e509f65fcf1c77a0c2080f398897d3629c2/apertium/translation/__init__.py#L38

# Add parameter mark_unknown
def _get_commands(self, l1: str, l2: str, mark_unknown: bool = True) -> List[List[str]]:

        self.translation_cmds[(l1, l2)] = parse_mode_file(mode_path, mark_unknown)

and Translator.translate:

https://github.com/apertium/apertium-python/blob/81b10e509f65fcf1c77a0c2080f398897d3629c2/apertium/translation/__init__.py#L168

cmds = list(self._get_commands(l1, l2, mark_unknown))

Also, the default value for mark_unknown everywhere is set to "True".

apertium / apertium-python

mark_unknown doesn't work #82