Llannelongue commented 1 year ago

Looking at the code, I think it would be worth refactoring it to clean up a bit from the hackaday rush and keep each module separately: in particular it would make it easier to address communications between functions, as raised by #25, and facilitate future expansion to new countries by replacing the API part for #22. It would also facilitate asynchronous tasks when some modules need to be bypassed.

An example of what can be confusing at the moment: tracking back where the optimal start time is being computed. In __init__.py, the optimal time to run the job is provided by writecsv from the parsedata.py file (not immediately clear from the name). writecsv itself calls cat_converter from timeseries_conversion which in turns calls the function get_lowest_carbon_intensity. The last function makes sense, but the steps in between would be really difficult to guess!

My suggestion is that each component is a separate class in a separate file, for now these are:

UK API call to obtain carbon intensity time series
Find the optimal running window to minimise carbon intensities
Calculate carbon footprint

And each component is called directly from the main function, something like:

def main(arguments=None):
    parser = parse_arguments()
    args = parser.parse_args(arguments)

    # ... some stuff about config file

    CI = APIcarbonIntensity_UK(args).getCI()

    optimal_time = runtimeOptimiser(args).findRuntime(CI)

    print(f"Best job start time: {optimal_time['start_time']}")

    if carbonFootprint:
        estim = greenAlgorithmsCalculator(...).get_footprint()
        print(f"carbon footprint: {estim}")

I'm keen to start working on a branch in this direction, but would like to hear people's thoughts on that as I've probable forgotten some aspects! :smiley:

andreww commented 1 year ago

I like it. I think a useful way to think about the design is probably to consider different consumers of the (3) classes / modules. How would this look if we wanted to use the same code to post-process some SLURM logs, for example? (I guess that is a good argument for separating the CI API stuff from the calculation of the optimum run time). What other example use cases could we plan for?

tlestang commented 1 year ago

Sounds good! I agree that a lot of the names are now quite misleading and there is a risk of us getting too comfortable with them ! :) But obviously naming is hard and this was perfectly acceptable in the context of the hack day. I'm thinking that fine-grained changes like renamings can happen progressively as we add missing features and reshuffle the overall design a bit.

Llannelongue commented 1 year ago

Yes completely, and actually the naming of individual functions is pretty clear and completely fine taken individually :) (writecsv indeed writes out the carbon intensities), but I think it got a bit muddled when integrating all the modules together last minute. Not a big deal overall

andreww commented 1 year ago

Mostly a note so I don't forget. Our current output time format isn't accepted by at on my laptop (a mac). The fix is to change: print(f"Best job start time: {starttime}") to print(f"{starttime:%m%d%H%M}") in __init__.py. I also needed a slightly different invocation then we have in the readme:

echo 'I'm running' | at -t `python -m cats -d 10 --loc OX1`

Anyway, I should look at this a bit more and turn it into sensible ideas. Maybe we need multiple executable python files installed into bin, and tests for this stuff.

tlestang commented 1 year ago

Hi - as discussed today during our meeting there are currently a few PRs open in this direction. I've drawn a little sketch of the high-level structure (I think) we are converging to and where these PRs stand with regard to it.

Three modules (square nodes). I've kept the current file and function names, although they will likely change for clarity.

api_query.py with a function get_tuple (name likely to be changed for something more meaning). This function is responsible for returning the forecast data, for instance as a list of point estimates. Caching and API request can be handled within this module. Additinaly, the entry point function can be specified strategies for reach a specific API and parse a response (see #37).
timeseries_conversion.py could provide an entry point function find_time that, given forecast data, e.g. a list of point estimates, returns the job start time and associated total carbon intensity budget. find_time is currently implemented in the __init__.py script, but it should probably be merged with timeseries_conversion.get_lowest_carbon_intensity. In my world this function uses the WindowedForecast helper class (see #36 ) but that's up for debate. Additionally the find_time function can also return the carbon intensity budget associated with running the job right now.
carbonFootprint.py provides the greenAlgorithmsCalculator class that is responsible for computing and making available the carbon footprint for the job and savings relative to running the job right now.

The thing I'd like to do now is implement the returned data of both get_tuple and find_time module as list of CarbonIntensityPointEstimate and tuple of CarbonIntensityTotalEstimate. I think that'll complete the picture.

@Llannelongue if you're keen to help with the clean up, they is plenty to do in terms of renaming, getting rid of obsolete functions and potentially reshuffling the location of some others. For instance sinc #30 I don't think there is any reason to write the forecast data as a csv on disk - which would make most if not all of parsedata.py obsolete.

tlestang commented 1 year ago

The thing I'd like to do now is implement the returned data of both get_tuple and find_time module as list of CarbonIntensityPointEstimate and tuple of CarbonIntensityTotalEstimate. I think that'll complete the picture.

41 does the first bit.

tlestang commented 1 year ago

31 #37 #41 #36 #41 #49 #50 #21

GreenScheduler / cats

Refactor the code for more modularity + self-explanatory function names #26

41 does the first bit.

31 #37 #41 #36 #41 #49 #50 #21