Closed bradvogel closed 10 years ago
Thanks for the suggestion. Thats a reasonable idea, but I'm worried about making the API too complex when most people use this library just for the full text functionality. I don't want to add a lot of flags and stuff. Let me think about a simple way to implement this.
You're right and I agree that most people probably use it for text(). But the other functions are really useful also. What about just exposing them on the exports, e.g.
var extractor = require('unfluff');
var everything = extractor(html);
var justTitle = extractor.title(html);
Something like that is pretty reasonable. PRs welcome or I'll take a look when I have a few minutes free.
Thanks again for the feedback! :)
Thanks for doing this! Unfortunately I won't have time for PR this week.
I would like to contribute. Mind if I take a look at this?
Sure @franza, you are welcome to take a pass at it. Also feel free to share a work in progress even if it's not totally done and tested.
Thanks! Released in v0.7.0. It is called extractor.lazy()
.
It'd be nice to be able to get title, image, and description for a page without getting the full text. Parsing the text can be very slow for long pages (e.g. http://en.wikipedia.org/wiki/Apple_Inc takes 2 seconds on my macbook).
Perhaps, something like:
or perhaps just change the API to expose the functions separately on the exports.