Open fdammassa opened 6 years ago
Hey,
I have picked a default of 4 parallel executions of parsing sitemaps (https://github.com/evanderkoogh/node-sitemap-stream-parser/blob/33ba4d9d958783e6f4598ab64e6ad0644da3d22f/index.coffee#L64).
I would play around with the settings on that. And if setting it to another value improves the experience for you it would be great to have that setting be configurable. Let me know if you need any more pointers of that.
How about implementing process.nextTick()
inside the loop callback?
Hey @fdammassa. Thanks for opening an issue, I finally had some time to investigate the issue and almost all the time is spent in parsing XML. Unfortunately parsing XML is extremely expensive and CPU intensive. And these sitemaps are many MBs of XML.
If you can give me a bit more context about what you are trying to do I might be able to help a bit better.
I'm experiencing a very high CPU utilization (100%) with large nested sitemaps.
The url callback is very simple since it increments a counter.
Could this be related to the "blocking" nature of url (and sitemap) callbacks? If you point out towards the right direction I can contribute to the project.
As an example, you could try this sitemap: https://www.walmart.com/sitemap_ip.xml