Closed GoogleCodeExporter closed 9 years ago
Original comment by ryan.justin.smith@gmail.com
on 13 Feb 2009 at 4:34
Original comment by ryan.justin.smith@gmail.com
on 13 Feb 2009 at 4:35
This has been tested with the logic now residing in shouldWrite() in
HbaseWriterProcessor.java
If you crawl a brand new site with "only_new_records" set to "true" , it
downloads
all urls configured to get by heritrix. If you run this exact same heritrix jo
configuration a 2nd time, no new records will be downloaded or written to hbase.
Original comment by ryan.justin.smith@gmail.com
on 16 Feb 2009 at 6:58
Original issue reported on code.google.com by
ryan.justin.smith@gmail.com
on 13 Feb 2009 at 4:34