codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.15k stars 2.12k forks source link

API Improvement with Lazy Loading #159

Open rothnic opened 9 years ago

rothnic commented 9 years ago

Something I noticed with the API is that you require, by convention, that someone call download/parse/nlp before accessing properties. This is a lazy evaluation problem, because the the common approach is to have the object ready to use when instantiated. Instead, we don't want to hammer the server each time an object is created if the data the property depends on isn't available.

Instead of the approach by enforcing convention and throwing errors, a more user friendly approach is to perform the operations as needed.

One library that helps with this is lazy, but this doesn't solve the whole problem.

You need to do something like this. See pseudocode below:

class Article(object):
    def __init__(self, url):
        self.url = url
        self._raw_article = None
        self._parsed_article = None

    @property
    def raw_article(self):
        if self._raw_article is None:
            self.download()
        return self._raw_article

    @property
    def parsed_article(self):
        if self._parsed_article is None:
            self.parse()
        return self._parsed_article

    @property
    def title(self):
        # the check for not None here will inherently lazy load the requirement
        # can check for other requirements as well here
        if self._parsed_article is not None:
            return self._parsed_article.title

Alternatively, something like lazy makes this more simple:

class Article(object):
    def __init__(self, url):
        self.url = url

    @lazy
    def raw_article(self):
        return self.download()

    @lazy
    def parsed_article(self):
        return self.parse()

    @property
    def title(self):
        # lazy captures and caches the lazy loaded properties
        return self.parsed_article.title
codelucas commented 9 years ago

This has never occurred to me, good idea!

baby5 commented 5 years ago

@codelucas a lazy article is helpful