Closed Llannelongue closed 1 year ago
I like it. I think a useful way to think about the design is probably to consider different consumers of the (3) classes / modules. How would this look if we wanted to use the same code to post-process some SLURM logs, for example? (I guess that is a good argument for separating the CI API stuff from the calculation of the optimum run time). What other example use cases could we plan for?
Sounds good! I agree that a lot of the names are now quite misleading and there is a risk of us getting too comfortable with them ! :) But obviously naming is hard and this was perfectly acceptable in the context of the hack day. I'm thinking that fine-grained changes like renamings can happen progressively as we add missing features and reshuffle the overall design a bit.
Yes completely, and actually the naming of individual functions is pretty clear and completely fine taken individually :) (writecsv
indeed writes out the carbon intensities), but I think it got a bit muddled when integrating all the modules together last minute. Not a big deal overall
Mostly a note so I don't forget. Our current output time format isn't accepted by at on my laptop (a mac). The fix is to change: print(f"Best job start time: {starttime}")
to print(f"{starttime:%m%d%H%M}")
in __init__.py
. I also needed a slightly different invocation then we have in the readme:
echo 'I'm running' | at -t `python -m cats -d 10 --loc OX1`
Anyway, I should look at this a bit more and turn it into sensible ideas. Maybe we need multiple executable python files installed into bin, and tests for this stuff.
Hi - as discussed today during our meeting there are currently a few PRs open in this direction. I've drawn a little sketch of the high-level structure (I think) we are converging to and where these PRs stand with regard to it.
Three modules (square nodes). I've kept the current file and function names, although they will likely change for clarity.
api_query.py
with a function get_tuple
(name likely to be changed for something more meaning). This function is responsible for returning the forecast data, for instance as a list of point estimates. Caching and API request can be handled within this module. Additinaly, the entry point function can be specified strategies for reach a specific API and parse a response (see #37).timeseries_conversion.py
could provide an entry point function find_time
that, given forecast data, e.g. a list of point estimates, returns the job start time and associated total carbon intensity budget. find_time
is currently implemented in the __init__.py
script, but it should probably be merged with timeseries_conversion.get_lowest_carbon_intensity
. In my world this function uses the WindowedForecast
helper class (see #36 ) but that's up for debate. Additionally the find_time
function can also return the carbon intensity budget associated with running the job right now. carbonFootprint.py
provides the greenAlgorithmsCalculator
class that is responsible for computing and making available the carbon footprint for the job and savings relative to running the job right now. The thing I'd like to do now is implement the returned data of both get_tuple
and find_time
module as list of CarbonIntensityPointEstimate
and tuple of CarbonIntensityTotalEstimate
. I think that'll complete the picture.
@Llannelongue if you're keen to help with the clean up, they is plenty to do in terms of renaming, getting rid of obsolete functions and potentially reshuffling the location of some others. For instance sinc #30 I don't think there is any reason to write the forecast data as a csv on disk - which would make most if not all of parsedata.py
obsolete.
The thing I'd like to do now is implement the returned data of both get_tuple and find_time module as list of CarbonIntensityPointEstimate and tuple of CarbonIntensityTotalEstimate. I think that'll complete the picture.
Looking at the code, I think it would be worth refactoring it to clean up a bit from the hackaday rush and keep each module separately: in particular it would make it easier to address communications between functions, as raised by #25, and facilitate future expansion to new countries by replacing the API part for #22. It would also facilitate asynchronous tasks when some modules need to be bypassed.
An example of what can be confusing at the moment: tracking back where the optimal start time is being computed. In
__init__.py
, the optimal time to run the job is provided bywritecsv
from theparsedata.py
file (not immediately clear from the name).writecsv
itself callscat_converter
fromtimeseries_conversion
which in turns calls the functionget_lowest_carbon_intensity
. The last function makes sense, but the steps in between would be really difficult to guess!My suggestion is that each component is a separate class in a separate file, for now these are:
And each component is called directly from the
main
function, something like:I'm keen to start working on a branch in this direction, but would like to hear people's thoughts on that as I've probable forgotten some aspects! :smiley: