Open ScottMansfield opened 9 years ago
ETag headers can be returned by the server to give a token to compare against. If-Modified-Since will check if the page has been modified since the last access. Both will reduce load on the servers during crawling.
If-Modified-Since: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25 ETag: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19
crawler-commons may be usable here: https://github.com/crawler-commons/crawler-commons
ETag headers can be returned by the server to give a token to compare against. If-Modified-Since will check if the page has been modified since the last access. Both will reduce load on the servers during crawling.
If-Modified-Since: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25 ETag: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19