Open slyubomirsky opened 1 week ago
When informally benchmarking, I did not find a large difference between the current implementation and this change, however. Perhaps the issue is how the benchmark was measured. Informal benchmark:
The function to test:
def func(t1, t2):
return tp.Tensor(list(t1.shape) != list(t2.shape), dtype=tp.bool)
I wrote it this way because the _is_compatible
helper in Parameter
returns a Python bool, but tp.compile
can only compile functions returning a Tripy tensor.
Comparison:
c = tp.compile(func, args=[tp.InputInfo(shape=[1, 3, 3], dtype=tp.int32), tp.InputInfo(shape=[1, 3, 3], dtype=tp.int32)])
t1 = tp.Tensor([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]])
t2 = tp.Tensor([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]])
def time_run():
start = time.perf_counter_ns()
for _ in range(1000):
c(t1, t2)
end = time.perf_counter_ns()
return (end - start) / 1.0e6
After warming up, both complete in about 25 microseconds. Should we do this comparison without compiling instead? I could also vary the input values.
@slyubomirsky most of the impact will be on the compile time I think.
Ah, okay. I'll profile compile time then.
FWIW, some cursory benchmarking also did not reveal much change in the compile time either (a difference about 100 microseconds over 1000 trials), though it might be a matter of the function that is compiled.
The function whose compilation I timed (the return value needed to be a tensor for it to be valid to submit to tp.compile
):
def func():
return tp.concatenate(tp.Tensor([1, 2, 3, 4].shape, dim=0)
If there's a specific benchmark we have in mind, that might better to measure.
@slyubomirsky I think a benchmark where it could make a difference is setting parameters on modules without involving any Tripy operations, e.g.:
module.param = tp.Parameter([1, 2, 3])
Previously, this would have triggered a compilation in order to compare the shape of the new parameter with that of the old one. With your changes, the compilation should be entirely bypassed.
Comparing the parameter update time directly does yield the expected difference 🙂 100x faster due to not compiling.
Addresses issue #360.