-
Until today (10.05.17) the analyzer (tokeniser-disamb-gt-desc.pmhfst) produced output containing word forms and their morphological analysis. Then the output changed to the following:
```
$ echo "о…
ulp16 updated
5 years ago
-
Similar to #10, Kazakh has the issue of two `neg.ifi paradigms`.
First-person singular (`neg.ifi.p1.sg`) looks like this:
- мен барған жоқпын
- мен бармадым
The question is whether there is a …
-
This issue was created automatically with bugzilla2github
# Bugzilla Bug 1803
Date: 2014-01-25T20:52:52+01:00
From: Trond Trosterud <>
To: Sjur Nørstebø Moshagen <>
CC: tomi.k.pieski, …
-
This issue was created automatically with bugzilla2github
# Bugzilla Bug 2585
Date: 2019-05-20T15:58:44+02:00
From: Børre Gaup <>
To: Linda Wiechetek <>
CC: linda.wiechetek, sjur.n.mos…
-
This issue was created automatically with bugzilla2github
# Bugzilla Bug 1248
Date: 2012-01-04T12:40:30+01:00
From: Trond Trosterud <>
To: Børre Gaup <>
CC: ciprian.gerstenberger, lene…
-
This issue was created automatically with bugzilla2github
# Bugzilla Bug 2356
Date: 2017-03-13T14:51:21+01:00
From: Trond Trosterud <>
To: Sjur Nørstebø Moshagen <>
CC: lene.antonsen, …
-
the problem seems to pertain only to compound nouns or clitics.
the commands used to generate:
$ ./autogen.sh && make -j
$ lt-expand apertium-rus.rus.dix
![image](https://user-images.github…
-
This issue was created automatically with bugzilla2github
# Bugzilla Bug 1502
Date: 2012-11-05T19:41:03+01:00
From: Jack Rueter <>
To: Tommi A Pirinen <>
CC: sjur.n.moshagen, trond.tro…
-
This issue was created automatically with bugzilla2github
# Bugzilla Bug 1641
Date: 2013-04-02T20:18:47+02:00
From: Trond Trosterud <>
To: Sjur Nørstebø Moshagen <>
CC: lene.antonsen, …
-
## What I want to do
When learning vocabulary of a not space-separated language such as Japanese with BPE, I think it is ordinary we use raw text.
However, I often want to build tokenizer using pr…