Open dimitarsh1 opened 4 years ago
Yeah, this is a bug.
Yes, I see. Thank you.
On Wed, 6 May 2020, 07:01 Sushain Cherivirala, notifications@github.com wrote:
Yeah, this is a bug.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/apertium/apertium-python/issues/82#issuecomment-624457640, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADYLMETW33LIJGA2NREENLTRQD4NDANCNFSM4M2ER4UQ .
OK,
so, I took some time to get what's going on in the code and what calls what. In the lttoolbox there is no support for unknown.
However, I did a quick fix the following way: In utils.py in handle_command_with_wrapper:
` .... text = end.decode() input_file.write(text) input_file.close()
if 'lt-proc' == command[0]:
fst = initialized_wrappers[command]
lt_proc_command, dictionary_path, arg = command[:-1], command[-1], command[1]
fst.lt_proc(lt_proc_command, input_file.name, output_file.name)
`
replaced with ` text = end.decode() input_file.write(text) input_file.close()
-->if 'lt-proc' == command[0] and "-n" != command[1]: fst = initialized_wrappers[command] lt_proc_command, dictionary_path, arg = command[:-1], command[-1], command[1] fst.lt_proc(lt_proc_command, input_file.name, output_file.name) `
Then changed also parse_mode_file
:
from
def parse_mode_file(mode_path: str) -> List[List[str]]: """ .... cmd = cmd.replace('$2', '').replace('$1', '-g') ....
to
def parse_mode_file(mode_path: str, mark_unknown: bool = True) -> List[List[str]]: .... if not mark_unknown: cmd = cmd.replace('$2', '').replace('$1', '-n') else: cmd = cmd.replace('$2', '').replace('$1', '-g') ....
Then in the translate/__init__.py
I changed _get_commands
:
`def _get_commands(self, l1: str, l2: str, mark_unknown: bool = True) -> List[List[str]]: """ Args: l1 (str) l2 (str)
Returns:
List[List[str]]
"""
if (l1, l2) not in self.translation_cmds:
mode_path = apertium.pairs['%s-%s' % (l1, l2)]
self.translation_cmds[(l1, l2)] = parse_mode_file(mode_path, mark_unknown)
return self.translation_cmds[(l1, l2)]
`
and Translator.translate
:
cmds = list(self._get_commands(l1, l2))
to cmds = list(self._get_commands(l1, l2, mark_unknown))
Also, the default value for mark_unknown everywhere is set to "True".
Don't know if that's a good fix - haven't had the time to delve into lttoolbox and FST, but it seems to work for me. System is ubuntu 18.04; python is 3.6.5
Kind regards, Dimitar
Could you send a proper diff/patch or PR? Your comment is really hard to read.
Reformatting the comment from @dimitarsh1 :
In the lttoolbox
there is no support for unknown.
In https://github.com/apertium/apertium-python/blob/81b10e509f65fcf1c77a0c2080f398897d3629c2/apertium/utils.py#L110 in handle_command_with_wrapper
:
if 'lt-proc' == command[0] and "-n" != command[1]:
Then changed also parse_mode_file
:
to
# Add parameter mark_unknown
def parse_mode_file(mode_path: str, mark_unknown: bool = True) -> List[List[str]]
if not mark_unknown:
cmd = cmd.replace('$2', '').replace('$1', '-n')
else:
cmd = cmd.replace('$2', '').replace('$1', '-g')
Then in the translate/__init__.py
I changed _get_commands
:
# Add parameter mark_unknown
def _get_commands(self, l1: str, l2: str, mark_unknown: bool = True) -> List[List[str]]:
self.translation_cmds[(l1, l2)] = parse_mode_file(mode_path, mark_unknown)
and Translator.translate
:
cmds = list(self._get_commands(l1, l2, mark_unknown))
Also, the default value for mark_unknown
everywhere is set to "True".
Hi, when translating, setting
mark_unknown
to False, does not impact the translation at all, thus always placing a "*" in front of unknown and "@", "#" and "/" in front of errors.Furthermore, in the translate function in
__init__.py
it seem that the mark_unknown argument does not do anything; it is not invoked or used anywhere.Any idea how to fix this?
Thanks in advance, Dimitar