evanderkoogh / node-sitemap-stream-parser

A streaming parser for sitemap files. Is able to deal with deeply nested sitemaps with 100+ million urls in them.
Apache License 2.0
38 stars 18 forks source link

promise? #17

Open AlJohri opened 5 years ago

AlJohri commented 5 years ago

my code isn't really set to do stream based processing right now and I wanted a version that returned a promise so I wrote this:

async function parseSitemap(sitemapURL) {
  const urls = [];
  return await new Promise((resolve, reject) => {
    sitemaps.parseSitemaps(sitemapURL, url => urls.push(url), (err, sitemaps) => {
      if (err) reject(err);
      resolve(urls);
    })
  })
}

could you expose a promise version directly? happy to submit a PR

AlJohri commented 5 years ago

nevermind, this is pretty easy with utils.promisify.

const { promisify } = require('util');
const parseSitemaps = promisify(sitemaps.parseSitemaps);
async function parseSitemap(sitemapURL, keys) {
  const urls = [];
  const sitemaps = await parseSitemaps(sitemapURL, url => urls.push(url));
  return urls;
}
evanderkoogh commented 5 years ago

Hey @AlJohri, if you are adding an example for the other features (sitemap url and filter for subsequent sitemaps), could you add this as well? Might help more people in the future.

Thanks!

AlJohri commented 5 years ago

@evanderkoogh sweet, I'll give this a shot. one small note is that since the done callback is no longer the last argument I believe promisify will no longer work.

promisify() assumes that original is a function taking a callback as its final argument in all cases. If original is not a function, promisify() will throw an error. If original is a function but its last argument is not an error-first callback, it will still be passed an error-first callback as its last argument.

https://github.com/nodejs/node/blob/master/doc/api/util.md

since you just cut the release <24 hours ago, is there any easy solution you can think of here?

evanderkoogh commented 5 years ago

Hey, that is a very good catch. I was trying to avoid any sort of backward incompatibility..

But I can just do a check and shuffle some arguments around, which I have done. I have indeed yanked version 1.5.2 and released version 1.6.0 where the sitemap_test argument is optional.

The other thing I have added is a promise version of the parseSitemaps method. I have C&Ped the example I used to test it below. Would love some documentation update.

const test = async () => {
  await parser.parseSitemapsPromise(
    "https://calibreapp.com/sitemap.xml",
    console.log,
    sitemap => {
      console.log({ sitemap });
      return true;
    }
  );
};
AlJohri commented 5 years ago

Thanks for the speedy fixes @evanderkoogh. Let me implement this in my application and then I'll take a stab at updating the docs.

AlJohri commented 5 years ago

@evanderkoogh just looked at your implementation. I'm not sure its exactly right?

https://github.com/evanderkoogh/node-sitemap-stream-parser/commit/03fc2a706a582703e53e88e03de8647c5a813a75

I think you need to call reject in the event of an error. it would be something closer to

exports.parseSitemapsPromise = (urls, url_cb, sitemap_test) => 
    new Promise (resolve, reject) => {
            const done = (err, sitemaps) => {
                if (err) reject(err);
                resolve(sitemaps);
            }
            return exports.parseSitemaps(urls, url_cb, sitemap_test, done)
        }

or much more simply since we can rely on the convention of callback being the last argument and callback taking (err, result):

const { promisify } = require('util');
exports.parseSitemapsPromise = promisify(exports.parseSitemaps);
pittersnider commented 5 years ago

This definitely needs to get easier than it is.
My suggestion is to keep it promisified and objective:

const { urls, sitemaps } = await sitemap.parse('https://github.com/sitemap.xml')
fotoflo commented 3 months ago

Hello, only 5 years later... Is one of these solutions working?