evanderkoogh / node-sitemap-stream-parser

A streaming parser for sitemap files. Is able to deal with deeply nested sitemaps with 100+ million urls in them.
Apache License 2.0
38 stars 18 forks source link

Made related sitemap urls relative. #1

Closed rolfsormo closed 7 years ago

rolfsormo commented 8 years ago

Hey there,

Cool little library you have here!

I bumped into a problem with one sitemap, that was missing "http" in the beginning of the urls, making the library fail in loading the related sitemaps.

In this commit I made use of Node's url.resolve.

As you can see, I had to make a unique SitemapParser for each downloaded sitemap, to allow the url resolution be based on the sitemap being loaded instead of being relative to the original url. This also required me to rename the internal class to avoid name clashes. I hope that's fine.

Anyways, I hope you pull this so I can return to using your version.

Cheers,

Rolf

evanderkoogh commented 7 years ago

This problem has been solved differently (and more in line with what I am looking for) in #5