issues
search
BitFunnel
/
Workbench
Java and Lucene based tools for BitFunnel corpus preparation
http://bitfunnel.org
MIT License
19
stars
4
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Many n-grams in corpus
#18
danluu
opened
7 years ago
2
*, =, etc. being removed
#17
danluu
opened
7 years ago
0
Stopwords being removed
#16
danluu
opened
7 years ago
0
Terms prefixed with numbers
#15
danluu
closed
7 years ago
1
i, ii, iii, etc.
#14
danluu
closed
7 years ago
1
Possibly spurious single letters
#13
danluu
closed
7 years ago
1
Dates don't appear to be normalized
#12
danluu
closed
7 years ago
1
Some terms appear to have possibly meaningless numbers at the end
#11
danluu
closed
7 years ago
1
Many terms appear to be filenames
#10
danluu
closed
7 years ago
1
Many terms have underscores in them
#9
danluu
opened
7 years ago
2
The vast majority of documents are tiny
#8
danluu
opened
7 years ago
11
We're not stemming some (any?) terms
#7
danluu
opened
7 years ago
0
Resolve tokenization issues causing BitFunnel parser crashes
#6
hausdorff
opened
8 years ago
0
ProcessDocumentHeader() in WikipediaDumpProcessor should use analyzer.
#5
MikeHopcroft
opened
8 years ago
0
README.md shows wrong output
#4
MikeHopcroft
opened
8 years ago
0
Wikipedia extraction seems to be giving bigrams
#3
MikeHopcroft
opened
8 years ago
1
JDK installation instructions missing
#2
danluu
opened
8 years ago
1
Image links are broken
#1
danluu
closed
8 years ago
0