What steps will reproduce the problem?
1. create a basic crawler
2. point the seed url to any sitemap.gz
3. start the crawler
What is the expected output? What do you see instead?
Gziped sitemaps are plain xml files that should be supported by the crawler,
they're not.
When issue 317 get fixed this will happen even more often.
What version of the product are you using?
latest from trunk/master
Please provide any additional information below.
I have created a fix in my crawler, just wraping the content with a
GZipinputstream (java common package):
if (page.getWebURL().getURL().endsWith(".gz")) {
is = new GZIPInputStream(new ByteArrayInputStream(page.getContentData()));
} else {
is = new ByteArrayInputStream(page.getContentData());
}
Original issue reported on code.google.com by panthro....@gmail.com on 16 Nov 2014 at 5:24
Original issue reported on code.google.com by
panthro....@gmail.com
on 16 Nov 2014 at 5:24