OpenSourceMasters / hbase-writer

HBase-Writer is a java extension to the Heritrix open source crawler. Heritrix is written by the Internet Archive and HBase Writer enables Heritrix to store crawled content directly into HBase tables running on the Hadoop Distributed FileSystem?. By default, HBase-Writer writes crawled url content into an HBase table as individual records or "rowkeys". Each fetched url is represented by a "rowkey" in an HBaase table. However, HBase-Writer can easily be extended for custom behavior, like writing to multiple tables or anything else. In turn, these HBase tables are directly supported by the MapReduce? framework via Hadoop. HBase-Writer's goal is to facilitate in fast large distributed crawls using Heritrix and to save and manage Web-scale content using HBase.
http://opensourcemasters.org/
Other
3 stars 3 forks source link

Little error in HBaseWriter.java with "via" #16

Closed LetMeR00t closed 7 years ago

LetMeR00t commented 8 years ago

Hello, I found this little error in "HBaseWriter.java", this is the current code :

addSerializedDataToPut(put, getHbaseParameters().getCuriColumnFamily(), getHbaseParameters().getViaColumnName(), curi.getVia() != null ? curi.toString() : null);

It would be :

addSerializedDataToPut(put, getHbaseParameters().getCuriColumnFamily(), getHbaseParameters().getViaColumnName(), curi.getVia() != null ? curi.getVia().toString() : null);

Thanks for the code which is very clear.

OpenSourceMasters commented 7 years ago

Thanks for catching this, good eye! Changes have been made and are now pushed into the 'master' branch.