issues
search
DigitalPebble
/
behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Other
281
stars
60
forks
source link
WARC converter to allow custom metadata
#63
Closed
jnioche
closed
6 years ago
jnioche
commented
6 years ago
similar to what is done by the
CorpusGenerator
similar to what is done by the CorpusGenerator