Estimate monetary cost of executing plan

TomNicholas commented 9 months ago

Cubed arguably has enough information to give a rough estimate of the monetary cost of executing the plan before starting execution.

I'm imagining a new method .estimate_cost(executor) that is similar to .compute(executor). Calling this we would know

how many arrays are to be processed, how big they all are, and what numpy functions are to be used to process them via the Plan object,
which serverless executor the functions are to be run on via the Executor passed,
the the temporary intermediate bucket information via the Spec object,

It would just print an estimation of the cost back to the user without running anything, and maybe raise warnings if they are planning to do something that seems obviously expensive (e.g. like having their temporary bucket for intermediate data be in AWS but their executor be GCF).

This means if we had a little table somewhere of e.g. AWS lambda and S3 prices, Cubed could consult those numbers and sum them. It would require an idea of e.g. how long it takes to run np.mean() on a chunk of a certain size on a certain container, but this seems like something that can be discovered fairly straightforwardly.

Obviously there are a long tail of cases where this wouldn't work, but often you might still be able to provide a lower bound cost estimate. For example if your plan had a step that applied some arbitrary function with apply_gufunc, cubed would not know if that was some super expensive function that would run for ever, but it would still be possible to estimate the minimum cost assuming that that function was very light.

tomwhite commented 9 months ago

Duplicate of #219?

TomNicholas commented 9 months ago

I had forgotten about #219, but actually I don't think this is a duplicate - I'm suggesting warning users of estimated costs before execution, whereas #219 seems to be about actual cost after execution. Though I imagine you could re-use much code when calculating both numbers.

Basically I think it would be useful for users to be able to see "hang on, this isn't supposed to cost that much, maybe I've not expressed the analysis I meant to..." before they actually waste that money.

tomwhite commented 9 months ago

Sounds good - let's keep both open.

cubed-dev / cubed

Estimate monetary cost of executing plan #334