AKST / Australian-Address-Boundaries-Land-Property-Price-Database

This is a database of geographic boundaries, addresses as well as land and property data (mostly NSW).
MIT License
1 stars 0 forks source link

Cache HTML during used during data discovery #2

Closed AKST closed 1 month ago

AKST commented 2 months ago

What is this

When you use an instance of a data discovery class it has the option to reuse HTML from previous requests

How this should work

Example of use

Initial example

from urllib.request import url open
from bs4 import BeautifulSoup

class MyResourceFetcher:
    def __init__(self, url: str):
        self._url = url

    def get_list_items(self):
        response = urlopen(self._url)
        soup = BeautifulSoup(response.read(), 'html.parser')
        return soup.find_all('li')
    @staticmethod
    def create():
        return MyResourceFetcher(url='https://www.valuergeneral.nsw.gov.au/land_value_summaries/lv.php')

Updated example

from lib.remote_file import RemoteFile, CacheCadence, CacheCountDownDate

class MyResourceFetcher:
    def __init__(self, html: RemoteFile):
        self._html = html

    def get_list_items(self):
        soup = BeautifulSoup(self._html.read(), 'html.parser')
        return soup.find_all('li')

    @staticmethod
    def create():
        lv_summaries_html = RemoteFile(
            id='nswvg_lv_directory',
            extension='html',
            url='https://www.valuergeneral.nsw.gov.au/land_value_summaries/lv.php',

            # this is the amount of time that would have to elapse before the cache would invalidate
            cache_cadence=CacheCadence.month(1),

            # for example if you fetched the resource on the 13th or the 5th or the 30th of May 2012.
            # The countdown for the cache expiring would be from the `1st of May 2012` and if the
            # resource was was ever requested after that it will get a new copy of the resource
            cache_cadence_start=CacheCountDownDate.start_of_the_month(),

            # if any of these errors occur then use the cache
            use_cache_on_error=[404, *range(500, 600)],

            # if you want to use the cache offline
            use_cache_offline=True,
        )

        return MyResourceFetcher(html=lv_summaries_html)

Questions to answer