Open Phaqui opened 1 year ago
For reference, these are currently the different pipelines used by smi.cgi:
echo input | hfst-lookup lang/generator-gt-norm.hfstol
echo input | hfst-lookup lang/analyser-gt-desc.hfstol
echo input | hfst-lookup lang/hyphenator-gt-desc.hfstol
echo input | hfst-lookup lang/txt2ipa.compose.hfst
echo input | hfst-lookup lang/oldorthography2norm.compose.hfst
only for kalaallisut (greenlandic) iso-code kal
echo input | hfst-lookup Latn-to-Cans.compose.hfs
only for plains cree, iso-code crk (maybe not even in use)
echo input | hfst-lookup Cans-to-Latn.compose.hfst
only for plains cree, iso-code crk (maybe not even in use)
echo input | hfst-tokenize -cg lang/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g lang/disambiguator.cg3
or for some languages (fao, sma, sme, smj, nob):
echo input | hfst-tokenize -cg lang/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g lang/korp.cg3 | vislcg3 -g lang/disambiguator.cg3
echo input | hfst-tokenize -cg lang/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g lang/disambiguator.cg3 | vislcg3 -g lang/dependency.cg3
or for some languages (fao, sma, sme, smj, nob):
echo input | hfst-tokenize -cg lang/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g lang/disambiguator.cg3 | vislcg3 -g lang/korp.cg3 | vislcg3 -g lang/dependency.cg3
"paradigm" Performs "generate" for a different inflections of the input word and presents the resulting forms to the user.
"placenames"
echo input | hfst-lookup geo.hfst
In addition, there are the number conversion programs:
echo input | hfst-lookup lang/transcriptor-numbers-digit2text.filtered.lookup.hfstol
echo input | hfst-lookup lang/transcriptor-clock-digit2text.filtered.lookup.hfstol
echo input | hfst-lookup lang/transcriptor-date-digit2text.filtered.lookup.hfstol
The actual language tool programs are native binaries, which also requires big datasets to work. The simplified conclusion is therefore that they cannot run in the user's browser, but must run on a server. This server will be "the API", and the initial assumption was that this project is tangential to the API. However:
*) Will be updated with more info