Closed MrRobot2211 closed 3 years ago
Hi @MrRobot2211 Thanks for pinging. Could you share a reproducer for the error that you encountered? I am having trouble understanding what's going on.
If you were to run python prof.py -p -c benchmarks/bench_raw.py
without the fix the plot would fail.
If you look at the the resulting csv
you will see that gpu time entries will have a string of an array of times instead of one time
cupy | cupy-gpu | time_raw | 10 | [1.55520001e-05 1.44959996e-05 1.25439996e-05 ... | gpu | 0
.
Internally the reason this happens is that the gpu timings are being stored in a dataframe as one array, and when passed to seaborn for plotting it does not kknow what to do.
@MrRobot2211 Thanks, I see what you meant now. I encountered an pandas.core.base.DataError
when running the benchmark, but didn't realize your output was taken from the csv file (with the -c
flag).
@emcastillo I think a better fix is to "flatten" the benchmark result:
diff --git a/cupy_prof/measure.py b/cupy_prof/measure.py
index d58531b..131a69f 100644
--- a/cupy_prof/measure.py
+++ b/cupy_prof/measure.py
@@ -12,7 +12,11 @@ class Measure(object):
def capture(self, name, key, times, xp_name):
for dev in times:
- for i, time in enumerate(times[dev]):
+ if dev == 'gpu':
+ dev_times = times[dev][0]
+ else: # dev == 'cpu'
+ dev_times = times[dev]
+ for i, time in enumerate(dev_times):
self.df['xp'].append(xp_name)
self.df['backend'].append('{}-{}'.format(
xp_name, dev))
The reason is that cupyx.time.repeat
collects results for each device, but if its device
argument is not set as done here
https://github.com/cupy/cupy-performance/blob/cdcb205a07d228d6e0ac583bc1510f632dd768e9/cupy_prof/runner.py#L53-L54
the current device is used, so the array shape for GPU in this case is (1, 10)
instead of (10,)
as on CPU. @MrRobot2211 Could you apply my patch instead? Thanks.
Thanks! will look at it during the day :)
Thanks, @MrRobot2211 @emcastillo!
great catch guys! thanks a lot!
Hi I am currently working in using CuPy as a backend to QuTiP as part of my GSoC project. I am trying out different ways of benchmarking CPU and GPU times.
I found there was an error here where all GPU timins for an iteration would be crammed together
this PR makes a quick--fix by callinf explode on the
df
.Is this package being maintained currently or are you leveraging other tools ?
Maybe @leofang you know something?