danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

Allow async iteration in environments where streams implementation supports it #292

Open lostfictions opened 3 years ago

lostfictions commented 3 years ago

Hi there, thanks for this library! It seems to work great so far.

One small issue I'm having is that I'd rather write code using some of the more ergonomic APIs that have been introduced into Node over the past few years. This includes async iteration support for readable streams, which has been available in Node experimentally since 10.0.0 and as stable since 11.14.0. (Incidentally, that means it's now available in all non-EOL versions of Node.)

Here's a short example of using feedparser with async iteration support:

import fetch from "node-fetch";
import Feedparser from "feedparser";

async function main() {
  const res = await fetch("https://news.google.com/rss/search?q=whatever");

  if (res.ok) {
    const parser = new Feedparser({ addmeta: false });
    res.body.pipe(parser);

    for await (const item of parser) {
      console.log(item.title);
    }
  }
}

main().then(() => { console.log("Done!"); });

Pretty nice! Unfortunately, this library's use of the userland readable streams implementation prevents this, since it doesn't seem to support async iteration. Trying to run the above sample will fail with an obscure error.

However, if I patch feedparser to require stream instead of readable-stream:

diff --git a/node_modules/feedparser/lib/feedparser/index.js b/node_modules/feedparser/lib/feedparser/index.js
index 916356f..71344d3 100644
--- a/node_modules/feedparser/lib/feedparser/index.js
+++ b/node_modules/feedparser/lib/feedparser/index.js
@@ -13,7 +13,7 @@ var sax = require('sax')
   , addressparser = require('addressparser')
   , indexOfObject = require('array-indexofobject')
   , util = require('util')
-  , TransformStream = require('readable-stream').Transform
+  , TransformStream = require('stream').Transform
   , _ = require('../utils');

The above sample works fine!

Is readable-stream being used to facilitate browser usage of this library? I think Webpack and Browserify already shim the streams module when bundling for browser environments, so it might not be necessary. Alternately, this library could maybe offer an alternate browser entrypoint in package.json (or maybe there's a polyfill package that already does this).

feedparser version: 2.2.10 Node version: 16.6.2

Cobertos commented 2 years ago

I think Webpack and Browserify already shim the streams module

Webpack recently dropped support for that in 5+, but the general recommendation is that you're supposed to polyfill it yourself with resolve.fallback and a webpack.ProvidePlugin (for global process object).

There shouldn't be any specific need to support browser except to interop using the browser's version of Readable and Writeable streams, which are making their way into all browsers.

Currently I'm making this work in the browser using something like:

    // Eventually, we will want to use .pipeTo or .pipeThrough in browsers when
    // they implement that, for now we have to kind of do it ourselves
    const readable = resp.body as any;
    if (!readable.pipe) {
      async function pipePolyfill(writable: Writable) {
        const lockedReader = this.getReader();

        let chunk;
        while (chunk = await lockedReader.read()) {
          const { value, done } = chunk;
          console.log('piping chunk', chunk);
          if (done) {
            break;
          }
          writable.write(value);
        }
      }
      readable.pipe = pipePolyfill;
    }

    readable.pipe(feedParser);