issues
search
fnl
/
syntok
Text tokenization and sentence segmentation (segtok v2)
MIT License
201
stars
34
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
German ordinal numbers lead to over splitting
#28
nickluger
opened
2 years ago
8
(Remove?) Support for Windows builds
#27
fnl
opened
2 years ago
1
Missed abbreviations (Min., Sen.)
#26
peter-lang-dealogic
closed
2 years ago
1
Undersplitting sentence ending in URL
#25
peter-lang-dealogic
opened
2 years ago
0
Fix false positive month abbreviation
#24
peter-lang-dealogic
closed
2 years ago
1
Fix "no. 1", "No. 1", "NO. 1" type abbreviation filtering.
#23
peter-lang-dealogic
closed
2 years ago
1
Under-splitting on "set." and "ago."
#22
leitneratselerity
closed
2 years ago
0
Oversplitting on "No. X" abbreviations
#21
leitneratselerity
closed
2 years ago
1
Add license file to source
#20
BastianZim
closed
2 years ago
3
Parenthesis at the end of input cause IndexError
#19
windreamer
closed
2 years ago
4
Zero Width unicode characters
#18
arjenpdevries
closed
2 years ago
9
fix not-contraction offsets + add test (resolves #15)
#17
KDercksen
closed
2 years ago
3
Do not segment inside parenthesis
#16
fnl
closed
2 years ago
2
Bug in not-contraction handling code
#15
KDercksen
closed
2 years ago
5
Splitting on single \n for sentence tokenization
#14
divyeshlad18
closed
4 years ago
2
Fix references to segtok
#13
svenski
closed
4 years ago
1
Adding all bible book names as abbreviations
#12
jakepoz
closed
2 years ago
8
git tag for 1.3.1
#11
wimmuskee
closed
4 years ago
1
Uppercase letters in tokens
#10
severinsimmler
closed
4 years ago
4
Segmenting sentences at colons
#9
fhamborg
opened
4 years ago
6
Best way to get sentence spans
#8
fhamborg
opened
4 years ago
4
git tag for 1.2.2
#7
wimmuskee
closed
4 years ago
1
Wrong offset with nonword-prefix
#6
Lingepumpe
closed
4 years ago
2
There's no direct way of getting offset of a sentence w.r.t. a document.
#5
zeeshanalipanhwar
closed
5 years ago
1
Issues with paragraph identification
#4
jspalink
closed
5 years ago
2
A large diff between syntok and segtok on Web of Science datasets
#3
newtover
closed
5 years ago
3
Benchmark against pragmatic segmenter
#2
Immortalin
opened
5 years ago
4
Splitting two words joined by ellipsis (...) and no spaces into different tokens
#1
nth-attempt
closed
5 years ago
2