evanderkoogh / node-sitemap-stream-parser

A streaming parser for sitemap files. Is able to deal with deeply nested sitemaps with 100+ million urls in them.
Apache License 2.0
38 stars 18 forks source link

Error: Callback was already called #20

Open max-frai opened 5 years ago

max-frai commented 5 years ago

When I parse this sitemap: https://gazeta.ua/sitemaps/sitemapindex.xml it fails with:

...node_modules/async/dist/async.js:966
        if (fn === null) throw new Error("Callback was already called.");
                         ^

Error: Callback was already called.
    at ...node_modules/async/dist/async.js:966:32
    at SAXStream.parserStream.on (...node_modules/sitemap-stream-parser/index.js:98:16)
    at SAXStream.emit (events.js:197:13)
    at SAXParser.SAXStream._parser.onend (...node_modules/sax/lib/sax.js:190:10)
    at emit (...node_modules/sax/lib/sax.js:624:35)
    at end (...node_modules/sax/lib/sax.js:667:5)
    at SAXParser.end (...node_modules/sax/lib/sax.js:154:24)
    at SAXStream.end (...node_modules/sax/lib/sax.js:248:18)
    at Gzip.onend (_stream_readable.js:655:10)
    at Object.onceWrapper (events.js:285:13)
    at Gzip.emit (events.js:202:15)
    at endReadableNT (_stream_readable.js:1129:12)
    at processTicksAndRejections (internal/process/next_tick.js:76:17)

I'm not sure what's the reason of this. Probably on(error) and on(end) are called for one parse url? But how coult it be possible?

lizaaard commented 5 years ago

I'm having the same issue!! and I Can't even catch the error.. it just breaks the execution..

lizaaard commented 5 years ago

When I parse this sitemap: https://gazeta.ua/sitemaps/sitemapindex.xml it fails with:

...node_modules/async/dist/async.js:966
        if (fn === null) throw new Error("Callback was already called.");
                         ^

Error: Callback was already called.
    at ...node_modules/async/dist/async.js:966:32
    at SAXStream.parserStream.on (...node_modules/sitemap-stream-parser/index.js:98:16)
    at SAXStream.emit (events.js:197:13)
    at SAXParser.SAXStream._parser.onend (...node_modules/sax/lib/sax.js:190:10)
    at emit (...node_modules/sax/lib/sax.js:624:35)
    at end (...node_modules/sax/lib/sax.js:667:5)
    at SAXParser.end (...node_modules/sax/lib/sax.js:154:24)
    at SAXStream.end (...node_modules/sax/lib/sax.js:248:18)
    at Gzip.onend (_stream_readable.js:655:10)
    at Object.onceWrapper (events.js:285:13)
    at Gzip.emit (events.js:202:15)
    at endReadableNT (_stream_readable.js:1129:12)
    at processTicksAndRejections (internal/process/next_tick.js:76:17)

I'm not sure what's the reason of this. Probably on(error) and on(end) are called for one parse url? But how coult it be possible?

btw.. rolling back to 1.6.0 works

max-frai commented 5 years ago

Hm, the 1.7.0 was my change from unzip to gzip to auto recognize zip algorithm.

lizaaard commented 5 years ago

Hm, the 1.7.0 was my change from unzip to gzip to auto recognize zip algorithm.

yep.. it happens only on sitemap indexes which contain nested sitemaps with ".xml.gz" extensions...

on non-compressed sitemaps works fine

evanderkoogh commented 5 years ago

Hey @max-frai, @lizaaard. Thanks for reporting this. I have been meaning to rewrite this thing in a slightly modern version with some upgraded dependencies. Been wanting to migrate away from Coffeescript and Request for a while anyway. Going to see how far I get today with this..

etairi commented 5 years ago

Any update on this?