issues
search
DigitalPebble
/
behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Other
281
stars
60
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
WARC converter to allow custom metadata
#63
jnioche
closed
6 years ago
0
Upgrade to Mahout 0.12.2, Java 1.7
#62
smarthi
closed
8 years ago
1
Multiple code improvements - squid:S1213, squid:S1854, squid:S1118
#61
georgekankava
closed
8 years ago
2
Multiple code improvements - common-java:DuplicatedBlocks, squid:S00112, squid:S134, squid:MethodCyclomaticComplexity, squid:HiddenFieldCheck, squid:S1172
#60
georgekankava
closed
8 years ago
0
squid:S1149 - Synchronized classes Vector, Hashtable, Stack and StringBuffer should not be used
#59
georgekankava
closed
8 years ago
0
squid:S1192 - String literals should not be duplicated
#58
georgekankava
closed
8 years ago
0
CorpusGenerator never invokes document.setText
#57
ppedemon
closed
9 years ago
2
Make Annotation's serializable and initial implementation of adding annotations to the Exporter output
#56
lewismc
opened
9 years ago
2
Upgrade to Mahout 0.10.0 #54
#55
lewismc
closed
8 years ago
6
Upgrade to Mahout 0.10.0
#54
lewismc
closed
8 years ago
3
Upgrade to Mahout 0.9 #52
#53
lewismc
closed
9 years ago
3
Upgrade to Mahout 0.9
#52
lewismc
closed
9 years ago
4
Use warc-hadoop library
#51
jnioche
opened
9 years ago
0
Elasticsearch module
#50
lewismc
opened
9 years ago
5
Upgrade to Apache Tika 1.7
#49
lewismc
closed
9 years ago
1
CTakes modules for Behemoth
#48
lewismc
opened
10 years ago
2
Upgrade hadoop to 1.2.1 and add override method to upgrade
#47
kiranchitturi
closed
10 years ago
8
Tests cant be run by more than one person
#46
alexmc6
opened
11 years ago
1
Unable to Index Tika file to Solr using behemoth
#45
nikeshsingh
closed
11 years ago
9
Remove reference to CountersExceededException for compatibility with CDH 4.1
#44
mumrah
closed
11 years ago
1
Solr 4.1.0 and various other updates
#43
gsingers
closed
11 years ago
4
Add negative filter for mimetype
#42
jnioche
opened
11 years ago
0
CorpusReader generic parameter for annotations
#41
ghost
opened
11 years ago
0
UIMAMapper to use UIMAProcessor
#40
jnioche
closed
8 years ago
1
Fix field mappings for Solr and new unit test
#39
mumrah
closed
12 years ago
4
Warn when input is not available for CorpusGenerator
#38
cklaussner
closed
12 years ago
1
Timings + SolrJ/LucidWorks
#37
gsingers
closed
12 years ago
1
Output to LucidWorks 2.1
#36
gsingers
closed
11 years ago
3
Common crawl
#35
gsingers
closed
12 years ago
5
Language Identification
#34
gsingers
closed
12 years ago
7
Bad Counter
#33
gsingers
closed
12 years ago
0
Conversion of Sequence Files
#32
gsingers
closed
12 years ago
6
Solr 3.5 update
#31
gsingers
closed
12 years ago
0
Upgrade to Mahout 0.6
#30
gsingers
closed
12 years ago
2
ClassNotFoundException org.apache.mahout.math.Vector
#29
telekoma
closed
12 years ago
5
Updates to Behemoth
#28
butlermh
closed
12 years ago
2
Classloader problems with job files that include behemoth.core.jar
#27
butlermh
closed
12 years ago
3
Unnecessary jars being included in .job files
#26
butlermh
closed
12 years ago
4
Exception when calling DistributedCache.purgeCache(job) in GATEDriver.java
#25
butlermh
closed
13 years ago
3
Ingest times with CorpusGenerator
#24
butlermh
closed
12 years ago
5
Versioning BehemothDocument
#23
butlermh
closed
12 years ago
1
Write Tutorial on processing Enron corpus with Tika
#22
jnioche
closed
13 years ago
1
Use regular expressions for annotation and type filters
#21
jnioche
opened
13 years ago
0
Mahout : add Lucene Tokenisation
#20
jnioche
closed
12 years ago
1
Mahout : fix the vocabulary size
#19
jnioche
closed
8 years ago
3
switch to new Hadoop API
#18
jnioche
opened
13 years ago
3
CorpusReader to ignore _logs subdirectory
#17
jnioche
closed
13 years ago
3
Improve BehemothDocument.ToString() to display key/values in metadata
#16
jnioche
closed
13 years ago
1
Exception in thread "main" java.io.IOException: can't find class: com.digitalpebble.behemoth.tika.TextArrayWritable
#15
jnioche
opened
13 years ago
0
Options to replace input with output of job
#14
jnioche
opened
13 years ago
1
Next