Closed mr-martian closed 3 years ago
Do we actually need that whole m4 script? Can we just ask pkg-config
about icu-io
directly?
Do we actually need that whole m4 script? Can we just ask
pkg-config
abouticu-io
directly?
Agreed. Just do it like https://github.com/apertium/lexd/blob/master/configure.ac#L19-L20
And I see /home/daniel/lttoolbox/lttoolbox/nft.nrm
in that diff.
Also, I don't think a normalization tool belongs in lttoolbox - that's something we probably want to adjust separately, so a repo of its own would be nice.
So far it looks like UnicodeString
is suitable, but it will be interesting to see benchmarks. In CG-3 I use typedef std::basic_string<UChar> UString;
for most strings, because it has a nicer interface and is movable.
LGTM
ICU changes (closes #81)
std::wstring
withUString
(=std::basic_string<UChar>
)InputFile
wrapper to handle UTF-8 streams with nullsefficiency, readability, and code style changes
Ltstr
andstring_to_wostream
int32_t
rather thanint
Transducer
std::vector
tostd::list
.clear()
and.empty()
to= ""
and== ""
regex_compiler
iterate over the input string rather than modifying itTransducer::determinize()
helper function and dependency changes
StringUtils
here from apertiumXMLParseUtil
functions more specific to their typical usecasesxml_walk_util.h
for cleanly iterating over children ofxmlNode*