dfabulich / sitemapgen4j

SitemapGen4j is a library to generate XML sitemaps in Java.
Apache License 2.0
160 stars 90 forks source link

error with gzip(true) and autovalidate(true) #19

Open mariodeci opened 9 years ago

mariodeci commented 9 years ago

CODE:

WebSitemapGenerator wsg;
        // generate foo sitemap
        wsg = WebSitemapGenerator.builder("http://www.example.com", new File(OUT_DIR)).fileNamePrefix("sitemap_big").gzip(true).autoValidate(true).build();
        for (int i = 0; i < 49999; i++)
            wsg.addUrl("http://www.example.com/foo" + i + ".html");
        wsg.write();
        wsg.writeSitemapsWithIndex(); // generate the sitemap_index.xml

Exception:

Exception in thread "main" java.lang.RuntimeException: Sitemap file failed to validate (bug?)
    at com.redfin.sitemapgenerator.SitemapGenerator.writeSiteMap(SitemapGenerator.java:248)
    at com.redfin.sitemapgenerator.SitemapGenerator.write(SitemapGenerator.java:169)
    at com.redfin.sitemapgenerator.WebSitemapGenerator.write(WebSitemapGenerator.java:12)
    at com.nttdata.sitemap.test.SitemapBuilderTest.main(SitemapBuilderTest.java:19)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Il contenuto non è consentito nel prologo.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:998)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
aeromac commented 1 year ago

it's trying to validate the gzipped file. maybe SitemapValidator.validateWebSitemap should take a boolean gzip param that uses GZIPInputStream