issues
search
DigitalPebble
/
behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Other
281
stars
60
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add option to com.digitalpebble.behemoth.util.CorpusReader to hide binaryContent
#13
jnioche
closed
13 years ago
1
Add module for OpenNLP components
#12
jnioche
opened
13 years ago
2
UIMAMapper should use the UIMAProcessor
#11
jnioche
closed
13 years ago
1
Tika processing
#10
gsingers
closed
13 years ago
2
Use Commons CLI for command line processing
#9
gsingers
opened
13 years ago
1
Tika Components
#8
gsingers
closed
13 years ago
2
Provide Interface Packaging
#7
gsingers
closed
13 years ago
3
Implement a BehemothCorpusLoader for GATE
#6
jnioche
closed
13 years ago
1
organise components as separate plugins
#5
jnioche
closed
13 years ago
2
use slf4j for logging
#4
jnioche
closed
14 years ago
1
TikaProcessor should generate annotations for representing the markup
#3
jnioche
closed
13 years ago
1
convert Behemoth annotations into native GATE / UIMA annotations for input
#2
jnioche
closed
13 years ago
2
Refactor Tika as a DocumentProcessor
#1
jnioche
closed
14 years ago
2
Previous