Closed KrishanBhasin closed 4 years ago
cc @quasiben
It looks like it's possible for a task to arrive without any known compute/transfer times. Perhaps this comes about if the task is already computed? I'm not sure. My guess is that this can be resolved by replacing d["startstops"]
with d.get("startstops, [])
Is this a change that you would be interested in making @KrishanBhasin ?
Thanks for the report @KrishanBhasin and for the pin @mrocklin . Submit a PR to fix. @KrishanBhasin if you have time can you test to see if it resolves your issue ?
Yep it just completed without errors, thanks for fixing it!
I was hoping for this to be another excuse for me to contribute a PR but you beat me 😂
What happened: Computing a Dask Dataframe results in failure partway with an error raised by
performance_report()
. The error raised is:What you expected to happen: Performance report to not fail, or performance report fails without impacting the computation
Minimal Complete Verifiable Example:
I am yet to craft one, as my use case involves creating a large dask graph and submitting it all at once. I will continue to try to find one, but I thought it might be helpful if I filed this early.
It looks like part of the timings information collected as part of https://github.com/dask/distributed/pull/3822 result in a request into a dict for keys that do not exist (see above error)
Unfortunately I don't know enough about distributed's internals to understand whether every task stream's dictionaries should include a
startstops
key.I'd be happy to contribute a PR that has a safety check for the key
startstops
, if that is the correct fix here.Error stacktrace
I'm doubtful this adds more useful information, but I have included it for completeness. ```python --------------------------------------------------------------------------- KeyError Traceback (most recent call last)Anything else we need to know?:
Environment: