Open rpanai opened 4 years ago
Thanks for raising this @rpanai.
This project grew from this notebook that I created a few years ago. The original notebook did include a cost estimate secion based on Fargate metrics.
Fargate costs are easier to estimate because the service is billed per second, the API gives exact billing times and costs and there is no multitenancy.
When using a service like ECS where you are billed for container instances independently it is much harder to provide cost information. When packing work onto instances there will be wastage. Do we include that in our estimates or not?
I decided not to include this feature initially because it felt complex to implement consistently and I didn't want to give users false information. However perhaps for services like Fargate we could introduce it.
Hi @jacobtomlinson, I understand your point. I'd say that where the estimate is easy/reliable it will be a nice thing to have. Let me know if I can help somehow.
If you want to raise a PR where you take the cost estimate logic from the notebook I linked and add it to the FargateCluster
class as a method with a name like estimate_cost()
I think that would be great.
Hi Jacob, I finally tried to work on this and I found it's not working as I wished/expected in particular when I'm using an adaptive cluster. I think it's something should be moved to dask.
My final goal is to have some cost estimative for every single run. As example if I have an adaprive cluster and I connect to it from 2 different scripts I'd like to know how much it cost each given operation.
Lets say I've
time
import time
def fun(x):
time.sleep(1)
return x**2
npartitions = 5
b = db.from_sequence(
list(range(200)),
npartitions=npartitions)\
.map(lambda x: fun(x))
out = b.compute()
I'd like to know how much this costs. But here Im not sure if db know somehow how many workers I am using and for how long.
Do you think its possible to achieve something in this direction?
The scheduler should know this information, rather than db. I wonder if we could capture that can estimate the costs from it?
Who do you think is the best person to ask? I made a decorator to get duration and max ram usage for a function but, as you said, the scheduler have all these infos.
I recommend you explore the performance_report
code in distributed because that records a lot of what is going on in the cluster. That could be a good place to get the value for $T$.
With SageMaker at the end of the training are printed the following two lines:
I think it will be nice to have something similar after we run a computation with dask. What I mean is:
T/60**2 * worker_hourly_cost