OpenSourceMasters / hbase-writer

HBase-Writer is a java extension to the Heritrix open source crawler. Heritrix is written by the Internet Archive and HBase Writer enables Heritrix to store crawled content directly into HBase tables running on the Hadoop Distributed FileSystem?. By default, HBase-Writer writes crawled url content into an HBase table as individual records or "rowkeys". Each fetched url is represented by a "rowkey" in an HBaase table. However, HBase-Writer can easily be extended for custom behavior, like writing to multiple tables or anything else. In turn, these HBase tables are directly supported by the MapReduce? framework via Hadoop. HBase-Writer's goal is to facilitate in fast large distributed crawls using Heritrix and to save and manage Web-scale content using HBase.
http://opensourcemasters.org/
Other
3 stars 3 forks source link

Find a home for hadoop and hbase jars so this module can pull them from someplace. #2

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Currently dependent jars are hardcoded in properties file.  Make it instead
so each has pom entry and we pull from remote repo.

Original issue reported on code.google.com by saint....@gmail.com on 10 Oct 2008 at 6:08

GoogleCodeExporter commented 9 years ago
Jars are now being pulled in from maven repository hosted by
http://repo1.opensourcemasters.org:8081/nexus

This is configured in the project pom and has been tested.

If you check out the project, and have maven installed, you should be able to 
type:
mvn clean install

And you will get all the hadoop/hbase dependencies pulled in and used to build 
with.

Original comment by ryan.justin.smith@gmail.com on 26 Nov 2008 at 3:53