evanderkoogh / node-sitemap-stream-parser

A streaming parser for sitemap files. Is able to deal with deeply nested sitemaps with 100+ million urls in them.
Apache License 2.0
38 stars 18 forks source link

return name of sitemap that url was returned from? #16

Open AlJohri opened 5 years ago

AlJohri commented 5 years ago

I'm working with a recursive sitemap and it would be helpful to return the name of the sitemap url that the item was returned from. is this possible?

I would ideally like this returned {sitemap: "...", url: "..."}

or perhaps even better, a way to prevent it from even going down certain paths of the sitemap tree based on a regex. my sitemap is datebased and I don't want it to traverse farther back than a few days

evanderkoogh commented 5 years ago

Hey @AlJohri, apologies it has taken a while, been super busy with my startup the last few weeks.

But I am pretty sure I have implemented both your suggestions..

The URL callback now also supplies a second argument, the url of the sitemap we are currently parsing. And the parseSitemaps method takes a 4th argument, a function that takes the url of the found sitemap. It will be added to the list to parse if the function returns true.

They are published to npm under version 1.5.1 Let me know if they work!

As I am really busy, could I ask you to add an example to the README with these 2 features?