OpenSourceMasters / hbase-writer

HBase-Writer is a java extension to the Heritrix open source crawler. Heritrix is written by the Internet Archive and HBase Writer enables Heritrix to store crawled content directly into HBase tables running on the Hadoop Distributed FileSystem?. By default, HBase-Writer writes crawled url content into an HBase table as individual records or "rowkeys". Each fetched url is represented by a "rowkey" in an HBaase table. However, HBase-Writer can easily be extended for custom behavior, like writing to multiple tables or anything else. In turn, these HBase tables are directly supported by the MapReduce? framework via Hadoop. HBase-Writer's goal is to facilitate in fast large distributed crawls using Heritrix and to save and manage Web-scale content using HBase.
http://opensourcemasters.org/
Other
3 stars 3 forks source link

HBaseAdmin call failed in newer version from HBase #13

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Download hbase-writer-0.9-20100330.003614-1.jar to heritrix-3.0.0/lib
2. Start Heritix
3. Modify the profile for HBaseWriter follow documentaion hbase-writer
4. Launch a crawl job

What is the expected output? What do you see instead?
java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.client.HBaseAdmin.<init>(Lorg/apache/hadoop/hbase/HBaseC
onfiguration;)V
        at org.archive.io.hbase.HBaseWriter.initializeCrawlTable(HBaseWriter.java:121)
        at org.archive.io.hbase.HBaseWriter.<init>(HBaseWriter.java:107)
        at org.archive.io.hbase.HBaseWriterPool$1.makeObject(HBaseWriterPool.java:55)
        at org.apache.commons.pool.impl.FairGenericObjectPool.borrowObject(FairGenericObjectPool.java:262)
        at org.archive.io.WriterPool.borrowFile(WriterPool.java:135)
        at org.archive.modules.writer.HBaseWriterProcessor.write(HBaseWriterProcessor.java:313)
        at org.archive.modules.writer.HBaseWriterProcessor.innerProcessResult(HBaseWriterProcessor.java:182)
        at org.archive.modules.Processor.process(Processor.java:144)
        at org.archive.modules.ProcessorChain.process(ProcessorChain.java:131)
        at org.archive.modules.DispositionChain.process(DispositionChain.java:55)
        at org.archive.crawler.framework.ToeThread.run(ToeThread.java:150)

What version of the product are you using? On what operating system?

hbase-writer-0.9-20100330
hbase-writer-0.20.3.jar
hadoop-0.20.2
hbase-0.89.0-r980101
zookeeper-3.3.1
SLES10-2 32bit

Please provide any additional information below.

In HBase 0.89 and future versions the call from HBaseAdmin was changed (see at 
http://hbase.apache.org/docs/r0.89.20100621/apidocs/org/apache/hadoop/hbase/clie
nt/HBaseAdmin.html). 

Original issue reported on code.google.com by eu...@arcor.de on 8 Aug 2010 at 9:24

GoogleCodeExporter commented 9 years ago
Support for hbase-0.89 has not been added to hbase-writer yet.  It will start 
from this ticket.  Thank you.

Original comment by ryan.justin.smith@gmail.com on 8 Aug 2010 at 9:36

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
A patch has been made to support Heritrix 3.1.0 and Hbase 0.90.4. 
Please check if you find usefull

Original comment by karthik...@gmail.com on 16 Nov 2011 at 3:48

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by ryan.justin.smith@gmail.com on 16 Nov 2011 at 9:58