issues
search
JuliaText
/
WordTokenizers.jl
High performance tokenizers for natural language processing and other related tasks
Other
96
stars
25
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
HTML_Entities dependency doesn't work with PackageCompiler
#65
AbrJA
opened
2 months ago
0
Unable to install WordTokenizers.jl
#64
ablaom
closed
6 months ago
2
Optimize statistical unigram tokenizer `decode_forward`
#63
aria42
opened
2 years ago
2
Sentence Splitters: no sentence break in between two words with no punctuation
#62
dhruvil410
opened
3 years ago
2
Adding GPT2 Tokenizer for WordTokenizers' Pretrained tokenizers
#61
shikhargoswami
opened
3 years ago
1
Sentence tokenization must ignore newline as whitespace in the default mode.
#60
sambitdash
opened
3 years ago
0
Interest in Improving Sentence Tokenization
#59
TheCedarPrince
opened
3 years ago
2
Fix Typos and Indentation
#58
SambhawDrag
closed
3 years ago
1
Lowercasing each token in tokenize function
#57
shikhargoswami
closed
3 years ago
3
use a normal function in __init__ to intialize the data deps
#56
KristofferC
closed
4 years ago
1
InitError on julia 1.5
#55
chengchingwen
closed
4 years ago
3
Adopt ColPrac?
#54
oxinabox
closed
4 years ago
1
Update to version 0.5.5.
#53
Ayushk4
closed
4 years ago
1
Release latest version
#52
tejasvaidhyadev
closed
4 years ago
0
Adding support for unigram sentencepiece model
#51
tejasvaidhyadev
closed
4 years ago
14
[WIP] Update README with JOSS Badge and Citation
#50
Ayushk4
opened
4 years ago
0
Install TagBot as a GitHub Action
#49
JuliaTagBot
closed
4 years ago
0
Update paper.md
#48
kthyng
closed
4 years ago
0
Update paper.bib
#47
kthyng
closed
4 years ago
0
Benchmark against Rust library
#46
oxinabox
opened
4 years ago
0
Update paper based on JOSS review
#45
oxinabox
closed
4 years ago
2
Add statistical tokenization algorithms
#44
Ayushk4
closed
4 years ago
20
Add installation guide to README
#43
Ayushk4
closed
4 years ago
1
Change example setting tokenizer to TinySegmenter.jl's tokenizer
#42
Ayushk4
closed
4 years ago
1
Fixing a number of typos in paper and readme
#41
leios
closed
4 years ago
1
Minor Fixes in JOSS paper
#40
Ayushk4
closed
4 years ago
1
very minor grammar fixes in README
#39
danielskatz
closed
4 years ago
1
Sentence spliting of sentences with out whitespace after period
#38
oxinabox
opened
5 years ago
2
Filtering the empty strings from substring array
#37
RohitPingale
opened
5 years ago
4
Add plot comparing speeds of tokenizers to JOSS paper.
#36
Ayushk4
closed
5 years ago
2
Support and Contribution guidelines
#35
Ayushk4
closed
5 years ago
1
JOSS paper update
#34
Ayushk4
closed
5 years ago
7
Handle final periods
#33
Ayushk4
closed
5 years ago
3
split_sentences - handling spaces after "."
#32
Ayushk4
opened
5 years ago
7
Toktok fix patch
#31
Ayushk4
closed
5 years ago
3
Update for Julia-1.1
#30
Ayushk4
closed
5 years ago
1
Fix TokTok.jl
#29
Ayushk4
closed
5 years ago
4
Tokenize begins with full stop.
#28
haampie
closed
5 years ago
1
Julia 1.1
#27
Ayushk4
closed
5 years ago
5
Make a release
#26
oxinabox
closed
5 years ago
0
Fix inconsistency between tabs and spaces
#25
Ayushk4
closed
5 years ago
2
Fix sentence splitter: sentences ending with acronyms
#24
nickto
closed
5 years ago
5
appveyor badge fix
#23
aquatiko
closed
5 years ago
1
Fix indentation in nltk_word.jl
#22
Ayushk4
closed
5 years ago
1
Fix indentation.
#21
Ayushk4
closed
5 years ago
2
Minor doc fixes in fast.jl
#20
Ayushk4
closed
5 years ago
2
some cleanup, inclusing changing TokenBuffer to use replaces rather than splits
#19
oxinabox
closed
5 years ago
0
add toktok tokenizer
#18
aquatiko
closed
5 years ago
17
fix names
#17
aquatiko
closed
5 years ago
2
Minor Fix in docs
#16
rsdel2007
closed
5 years ago
1
Next