Closed khannatanmai closed 4 years ago
I don't think this will work on ^arco<n><m><sg><sf:rainbow># iris$
There's also an issue if a secondary tag value contains an escaped $
.
I don't think this will work on
^arco<n><m><sg><sf:rainbow># iris$
There's also an issue if a secondary tag value contains an escaped
$
.
True. Will fix
I think this is correct unless unescaped #
is allowed in secondary tags. That is, <loc:12#7>
(location: sentence 12, word 7, or some such) is problematic unless we require it to be <loc:12\#7>
. Dealing with this may require explicitly tracking tag boundaries within the ignore loop.
Unescaped ^
$
and #
should be allowed inside <>
, I'd say. I don't think they currently are, but I see no reason they can't be.
@TinoDidriksen @mr-martian Almost all parsers use '$' as an input to process an LU, so not allowing unescaped special characters is only consistent with the current state of the tool.
Dealing with this may require explicitly tracking tag boundaries within the ignore loop.
Yeah. If we think we really don't want escaping (only for # and $). Then I can implement it.
@mr-martian @TinoDidriksen Now works with any unescaped characters inside secondary tags.
Tests:
Tanmais-MacBook-Pro:transfer khannatanmai$ echo "^Stroke<n><sg># of genius$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
Stroke of genius
Tanmais-MacBook-Pro:transfer khannatanmai$ echo "^Stroke<n><sg><sf:4\#sabasa><id:2\#:># of genius$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
Stroke of genius
Tanmais-MacBook-Pro:transfer khannatanmai$ echo "^Stroke<n><sg><sf:sabasa><id:2># of genius$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
Stroke of genius
Tanmais-MacBook-Pro:transfer khannatanmai$ echo "^Stroke<n><sg><sf:4#sabasa><id:2#:># of genius$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
Stroke of genius
Tanmais-MacBook-Pro:transfer khannatanmai$ echo "^Stroke<n><sg><sf:$$4#saba$sa><id:2#:$># of genius$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
Stroke of genius
EDIT: Prefixes can have unescaped special characters as well:
echo "^Stroke<n><sg><$$s#^f:$$4#saba$sa><i#$$#^d:2#:$># of genius$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
Stroke of genius
Works with compounds:
Tanmais-MacBook-Pro:lt_proc khannatanmai$ echo "^be<vblex><subs>+not<adv># sorry$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
being not sorry
Tanmais-MacBook-Pro:lt_proc khannatanmai$ echo "^be<vblex><subs><sf:xyz>+not<adv><sf:abc># sorry$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
being not sorry
Tanmais-MacBook-Pro:lt_proc khannatanmai$ echo "^be<vblex><subs><sf:xyz><id:++$$#>+not<adv><s$f:$+$a##bc># sorry$" | lt-proc -g ../../apertium-eng-spa/spa-eng.autogen.bin
being not sorry
This is needed for generation. Input:
Earlier Output:
New Output: