Closed mfschubert closed 7 months ago
Nice work! Is there any way to mark the moved / renamed files as such? This would help condense the diffs.
Nice work! Is there any way to mark the moved / renamed files as such? This would help condense the diffs.
Unfortunately, this may be difficult to do at this point, but I can list the files which have not changed substantially.
regular_shapes
is basically unchangedtest_regular_shapes
is a slightly changed version of the original imageruler_test
; it contains tests based on shapes from the regular_shapes
module. The main difference is that we no longer check for duality between binary opening and closing, as closing not implemented in the new imageruler
module (it is not required).simple_shapes
notebook is renamed from the original examples
notebook, and only updated as needed for the new api.This PR does not contain the function for computing the minimum lengthscale of void regions and the function for computing the overall minimum lengthscale. In the current main branch, the two functions are minimum_length_void()
and minimum_length()
. I do not think these two functions should be omitted, especially the second one, which generally has lower computational cost compared with computing both solid and void minimum lengthscales and then taking the minimum. The corresponding lengthscale violation functions are also not included in this PR.
This PR does not contain the function for computing the minimum lengthscale of void regions and the function for computing the overall minimum lengthscale. In the current main branch, the two functions are
minimum_length_void()
andminimum_length()
. I do not think these two functions should be omitted, especially the second one, which generally has lower computational cost compared with computing both solid and void minimum lengthscales and then taking the minimum. The corresponding lengthscale violation functions are also not included in this PR.
Yes, this was a judgement call---to eliminate redundant functionality, in service of a simpler API for which consistency and correctness is easier to ensure.
The overall minimum length scale can be computed by min(minimum_length_scale(x))
, and e.g. length scale violations for void can be computed by length_scale_violations_solid(~x, length_scale)
. We could add functions with these implementations (i.e. trivial wrappers for existing functions), but I think it is counter to the objective if we were to add new functions that have different implementations.
The overall minimum length scale can be computed by min(minimum_length_scale(x)), and e.g. length scale violations for void can be computed by length_scale_violations_solid(~x, length_scale). We could add functions with these implementations (i.e. trivial wrappers for existing functions),
Yes, I understand these two functions can be trivially implemented in this way, and I do not insist on adding this trivial wrapper for void or not, but I prefer not to add the trivial wrapper for overall minimum lengthscale as min(minimum_length_scale(x))
. Instead, I still prefer a function for overall minimum lengthscale with the previous open-close
approach, which has lower computational cost than computing the two lengthscales and taking the minimum. This advantage is not obvious in our test examples, which only involve design patterns with small arrays. But for design patterns with large arrays, the difference may be obvious, especially in the cases where solid and void minimum lengthscales require very different numbers of binary searches.
but I think it is counter to the objective if we were to add new functions that have different implementations.
I think a new function with different implementation is not something that should be avoided if this different implementation has advantage.
I think a new function with different implementation is not something that should be avoided if this different implementation has advantage.
My philosophy here is that the additional code and complexity constitutes a disadvantage, and we have to balance the pros against the cons. I think we may have an opportunity to discuss on Thursday.
In any case, with regards to performance there are some improvements with the new code. Here is a benchmarking snippet:
def separated_circles(separation_distance: int) -> onp.ndarray:
left_circle = imageruler.get_kernel(80)
right_circle = imageruler.get_kernel(60)
right_circle = onp.pad(right_circle, ((10, 10), (0, 0)))
circles = onp.concatenate(
[left_circle, onp.zeros((80, separation_distance)), right_circle],
axis=1,
)
circles = onp.pad(circles, ((10, 10), (10, 10))).astype(bool)
return circles
print("Using `minimum_length`")
%timeit imageruler.minimum_length(separated_circles(1))
%timeit imageruler.minimum_length(separated_circles(5))
%timeit imageruler.minimum_length(separated_circles(10))
%timeit imageruler.minimum_length(separated_circles(20))
print("Using `minimum_length_solid_void`")
%timeit imageruler.minimum_length_solid_void(separated_circles(1))
%timeit imageruler.minimum_length_solid_void(separated_circles(5))
%timeit imageruler.minimum_length_solid_void(separated_circles(10))
%timeit imageruler.minimum_length_solid_void(separated_circles(20))
print("")
for dim in [1, 5, 10, 20]:
print(f"Separation distance = {dim}")
print(f" Absolute minimum length scale: {imageruler.minimum_length(separated_circles(dim))}")
print(f" Minimum (solid, void): {imageruler.minimum_length_solid_void(separated_circles(dim))}")
With the existing implementation (using a colab CPU instance), this prints
Using `minimum_length`
308 ms ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
225 ms ± 4.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
204 ms ± 20.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
319 ms ± 73.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Using `minimum_length_solid_void`
408 ms ± 5.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
448 ms ± 8.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
398 ms ± 80.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
504 ms ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Separation distance = 1
Absolute minimum length scale: 3.513671875
Minimum (solid, void): (59.974609375, 3.513671875)
Separation distance = 5
Absolute minimum length scale: 5.447265625
Minimum (solid, void): (59.974609375, 5.447265625)
Separation distance = 10
Absolute minimum length scale: 10.474609375
Minimum (solid, void): (59.974609375, 10.474609375)
Separation distance = 20
Absolute minimum length scale: 20.529296875
Minimum (solid, void): (59.974609375, 20.529296875)
As you said, there is some performance benefit for minimum_length
(approaching 2x). However, running the same code using the new implementation, we get
Using `minimum_length_scale`
12.3 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
7.65 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.92 ms ± 239 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
17.8 ms ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Separation distance = 1
Minimum (solid, void): (60, 1)
Separation distance = 5
Minimum (solid, void): (60, 5)
Separation distance = 10
Minimum (solid, void): (60, 10)
Separation distance = 20
Minimum (solid, void): (60, 20)
So, there is a more than 10x performance improvement with the new implementation.
there is a more than 10x performance improvement with the new implementation.
Nice test and great improvement! The new implementation indeed runs much faster than the old implementation, but the advantage of the open-close
approach over the min(minimum_length_scale(x))
approach needs to be gauged within the same implementation, as you did for the old implementation. I believe that, just like their counterparts in the old implementation, if the open-close
approach is added in this new implementation, it would be faster than its own min(minimum_length_scale(x))
approach.
there is a more than 10x performance improvement with the new implementation.
Nice test and great improvement! The new implementation indeed runs much faster than the old implementation, but the advantage of the
open-close
approach over themin(minimum_length_scale(x))
approach needs to be gauged within the same implementation, as you did for the old implementation. I believe that, just like their counterparts in the old implementation, if theopen-close
approach is added in this new implementation, it would be faster than its ownmin(minimum_length_scale(x))
approach.
Yes, I agree with this.
If performance is not too critical, then the extra code complexity of adding the open–close algorithm might not be worth it for a factor-of-two improvement, at least at the current scales where we expect to apply this code. So, the simplicity of leaving it out might be better.
However, if we leave out this optimization, it would be worth adding a comment to the code mentioning this possibility if we want to improve performance in the future.
I just talked with Steven and now agree with removing the open-close
approach.
LGTM.
I noticed that the three notebooks (
to_designs.ipynb
,simple_shapes.ipynb
, andadvanced.ipynb
) do not include the outputs from actually running them. Perhaps this was intentional?
Correct, to view the output one should look at the docs page, which you can preview here: https://mfschubert.github.io/imageruler/readme.html
Docs are automatically generated when there is a push to main branch, and notebook outputs being stripped is actually enforced by one of the pre-commit rules. This way, we avoid having large (code) files as part of the repo.
As discussed in #25 and agreed on during discussion on 3/21, it would be beneficial to simplify the imageruler API and extend testing, so that e.g. automated usage becomes viable. This PR is the follow-up from that discussion.
The changes in this PR include,
With these changes, #25, #24, #23, and #22 can be considered resolved.
For #22 specifically, I created a notebook which shows how the new ignore scheme addresses the issue.
@oskooi @stevengj @mawc2019 @ianwilliamson