Implement caching of carbon intensity forecast

tlestang commented 1 year ago

Currently a new request to carbonintensity.org.uk is made each time cats is run. In cats/__init__.py:

def findtime(postcode, duration):
  tuples = get_tuple(postcode) # API request
  result = writecsv(tuples, duration) # write intensity data to disk
                      # as csv timeseries
  # ...

Although the carbon intensity data obtained from the API is written on disk, this not taken advantage of. Instead, if the relevant carbon intensity data is already on disk, we'd like to reuse this data instead of making a new request each time.

The local carbon intensity forecast data is reusable if the last forecast datetime is beyond the expected finish datetime of the application, i.e. forecast_end > now() + runtime.

A possible approach is to reshuffle the responsabilites of both top-level functions api_query.get_tuple and parsedata.writecsv.

First, get_tuple could be responsible for ensuring that the right data is present on disk, and download it if not.
Then writecsv only cares about computing the best job start time, assuming correct intensity data is available. For instance,

# cats/__init__.py
def findtime(postcode, duration):
  tuples = get_tuple(postcode)
  result = writecsv(tuples, duration)

then becomes

# cats/__init__.py
def findtime(postcode, duration):
  # Check if cached carbon intensity data goes beyond
  # now() + duration, download new forecast if not
  # formerly `get_tuple()`
  ensure_cached_intensity_data(postcode, duration)
  # Then -- assuming data is available on disk -- compute
  # the best time to start the job.
  # formerly `writecsv()`
  result = get_best_start_time(duration)

This approach has the benefit is maitaining a good separation between talking to the API – and caching intensity data – and the calculation of the start time. We currently do almost have this, expect that the function returning the start time is also responsible for writing the intensity data on disk.

Another possible approach is to push the API query and data caching down to the current writecsv function:

def writecsv(data_path: str, duration=None) -> dict[str, int]:
    try:
    return cat_converter(data_path, method, duration)
    except MissingItensityDataError:
    cache_latest_intensity_forecast(postcode)
    return cat_converter(data_path, method, duration)

andreww commented 1 year ago

If we want, we could cache at the HTTP layer where we call the API. This seems quite easy (see #30).

tlestang commented 1 year ago

Yes - I actually like your approach a lot better. It maybe doesn't allow for as much data reuse but it's much simpler. And I guess it covers most of the caching use case (i.e. several request in the same first or second half of the hour).

andreww commented 1 year ago

Merged the requests caching stuff. I think this is resolved.

GreenScheduler / cats

Implement caching of carbon intensity forecast #25