OpenPecha Botok issues - Githubissues

OpenPecha / Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

https://botok.readthedocs.io/

Apache License 2.0

58 stars 15 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

pybo 0.6.0 tokenizer failed for འིའོ

#57 10zinten closed 5 years ago
5
Huge memory cost when initializing the tokenizer

#56 BLKSerene closed 4 years ago
3
Sentencize a list of tokens that have been manually tokenized by adding spaces

#55 BLKSerene opened 5 years ago
1
Failed to tokenize text with pybo 0.6.3

#54 BLKSerene closed 5 years ago
3
update

#53 drupchen closed 5 years ago
0
Missing character when updating from pybo 0.4.0 to pybo 0.6.0, BoTokenizer to WordTokenizer

#52 aninrusimha closed 5 years ago
4
oops

#51 drupchen closed 5 years ago
1
Add multiple words per entry

#50 drupchen closed 5 years ago
1
test

#49 drupchen closed 5 years ago
1
finding sentence limits

#48 eroux opened 5 years ago
11
Trie's handing of word list that contains both པར་(photo) and པར་(particle)

#47 evanyerburgh closed 5 years ago
1
Update README.md

#46 evanyerburgh closed 5 years ago
0
Unicode normalisation

#45 ngawangtrinley closed 5 years ago
4
Add folia output to pybo

#44 ngawangtrinley closed 5 years ago
0
Sentences and Paragraphs as Token attributes

#43 drupchen opened 5 years ago
0
Warning issued after upgrading PyYAML to 5.1

#42 BLKSerene closed 5 years ago
1
syllable boundary bug

#41 drupchen closed 5 years ago
0
Oops ! on the wrong branch

#40 drupchen closed 5 years ago
1
refactor parsing resource files to directory based

#39 10zinten closed 5 years ago
1
POS-tagging a list of tokens that have already been tokenized

#38 BLKSerene closed 5 years ago
6
Sentence tokenization and detokenization

#37 BLKSerene closed 5 years ago
6
Add reathedoc style documentation

#36 10zinten closed 5 years ago
1
How to initialize the tokenizer without the POS tagging feature?

#35 BLKSerene closed 5 years ago
3
Cache and reuse temporary files to speed up initialization

#34 BLKSerene closed 5 years ago
6
Remove trailing whitespace in tokens

#33 BLKSerene closed 5 years ago
4
Bopipeline

#32 drupchen closed 5 years ago
1
What's the tagset used by pybo?

#31 BLKSerene closed 5 years ago
2
CQLMatcher can not match last token

#30 kevinhuangtw closed 5 years ago
2
change toadd_filenames and todel_filenames to a folder path

#29 drupchen closed 5 years ago
1
sanskrit entries don't seem to be inflected

#28 drupchen closed 5 years ago
1
How to add my own dictionary

#27 CrystalWLH closed 5 years ago
7
add: adjust rule for ལ་ལ་ལ་ལ་

#26 10zinten closed 6 years ago
0
additional affix combinations

#25 eroux closed 6 years ago
1
using unicode data

#24 eroux closed 5 years ago
2
The resources for the frequency is not in the package

#23 thubtenrigzin closed 6 years ago
1
integrate tests in setup.py

#22 eroux closed 6 years ago
0
Missing syllabes and punctuations

#21 thubtenrigzin closed 6 years ago
3
default value for Token#pos

#20 drupchen closed 6 years ago
1
symbol considered as token content

#19 drupchen closed 6 years ago
1
tokenizer gives IndexError

#18 mikkokotila closed 6 years ago
5
word2vec implementation in Tibetan

#17 mikkokotila closed 5 years ago
9
colibri for gramm'n

#16 mikkokotila closed 5 years ago
3
handling genitive case (and maybe other cases too)

#15 mikkokotila closed 6 years ago
5
Travis, README.md, etc update

#14 mikkokotila closed 6 years ago
2
tests failing because of LemmatizeTokens().lemmatize(tokens)

#13 mikkokotila closed 6 years ago
2
suggestion for token conventions

#12 mikkokotila closed 6 years ago
2
int and bool

#11 ngawangtrinley closed 6 years ago
0
NONE error when trying to match int or bool token attributes

#10 ngawangtrinley opened 6 years ago
4
yaml fails to import

#9 mikkokotila closed 6 years ago
2
tokenizer fails

#8 mikkokotila closed 6 years ago
6

Previous Next