issues
search
OpenPecha
/
Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
https://botok.readthedocs.io/
Apache License 2.0
58
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
pybo 0.6.0 tokenizer failed for འིའོ
#57
10zinten
closed
5 years ago
5
Huge memory cost when initializing the tokenizer
#56
BLKSerene
closed
4 years ago
3
Sentencize a list of tokens that have been manually tokenized by adding spaces
#55
BLKSerene
opened
5 years ago
1
Failed to tokenize text with pybo 0.6.3
#54
BLKSerene
closed
5 years ago
3
update
#53
drupchen
closed
5 years ago
0
Missing character when updating from pybo 0.4.0 to pybo 0.6.0, BoTokenizer to WordTokenizer
#52
aninrusimha
closed
5 years ago
4
oops
#51
drupchen
closed
5 years ago
1
Add multiple words per entry
#50
drupchen
closed
5 years ago
1
test
#49
drupchen
closed
5 years ago
1
finding sentence limits
#48
eroux
opened
5 years ago
11
Trie's handing of word list that contains both པར་(photo) and པར་(particle)
#47
evanyerburgh
closed
5 years ago
1
Update README.md
#46
evanyerburgh
closed
5 years ago
0
Unicode normalisation
#45
ngawangtrinley
closed
5 years ago
4
Add folia output to pybo
#44
ngawangtrinley
closed
5 years ago
0
Sentences and Paragraphs as Token attributes
#43
drupchen
opened
5 years ago
0
Warning issued after upgrading PyYAML to 5.1
#42
BLKSerene
closed
5 years ago
1
syllable boundary bug
#41
drupchen
closed
5 years ago
0
Oops ! on the wrong branch
#40
drupchen
closed
5 years ago
1
refactor parsing resource files to directory based
#39
10zinten
closed
5 years ago
1
POS-tagging a list of tokens that have already been tokenized
#38
BLKSerene
closed
5 years ago
6
Sentence tokenization and detokenization
#37
BLKSerene
closed
5 years ago
6
Add reathedoc style documentation
#36
10zinten
closed
5 years ago
1
How to initialize the tokenizer without the POS tagging feature?
#35
BLKSerene
closed
5 years ago
3
Cache and reuse temporary files to speed up initialization
#34
BLKSerene
closed
5 years ago
6
Remove trailing whitespace in tokens
#33
BLKSerene
closed
5 years ago
4
Bopipeline
#32
drupchen
closed
5 years ago
1
What's the tagset used by pybo?
#31
BLKSerene
closed
5 years ago
2
CQLMatcher can not match last token
#30
kevinhuangtw
closed
5 years ago
2
change toadd_filenames and todel_filenames to a folder path
#29
drupchen
closed
5 years ago
1
sanskrit entries don't seem to be inflected
#28
drupchen
closed
5 years ago
1
How to add my own dictionary
#27
CrystalWLH
closed
5 years ago
7
add: adjust rule for ལ་ལ་ལ་ལ་
#26
10zinten
closed
6 years ago
0
additional affix combinations
#25
eroux
closed
6 years ago
1
using unicode data
#24
eroux
closed
5 years ago
2
The resources for the frequency is not in the package
#23
thubtenrigzin
closed
6 years ago
1
integrate tests in setup.py
#22
eroux
closed
6 years ago
0
Missing syllabes and punctuations
#21
thubtenrigzin
closed
6 years ago
3
default value for Token#pos
#20
drupchen
closed
6 years ago
1
symbol considered as token content
#19
drupchen
closed
6 years ago
1
tokenizer gives IndexError
#18
mikkokotila
closed
6 years ago
5
word2vec implementation in Tibetan
#17
mikkokotila
closed
5 years ago
9
colibri for gramm'n
#16
mikkokotila
closed
5 years ago
3
handling genitive case (and maybe other cases too)
#15
mikkokotila
closed
6 years ago
5
Travis, README.md, etc update
#14
mikkokotila
closed
6 years ago
2
tests failing because of LemmatizeTokens().lemmatize(tokens)
#13
mikkokotila
closed
6 years ago
2
suggestion for token conventions
#12
mikkokotila
closed
6 years ago
2
int and bool
#11
ngawangtrinley
closed
6 years ago
0
NONE error when trying to match int or bool token attributes
#10
ngawangtrinley
opened
6 years ago
4
yaml fails to import
#9
mikkokotila
closed
6 years ago
2
tokenizer fails
#8
mikkokotila
closed
6 years ago
6
Previous
Next