JuliaGeometry / GeometryBasics.jl

Basic Geometry Types
MIT License
165 stars 54 forks source link

Cleanup Vec aliases #100

Closed juliohm closed 4 years ago

juliohm commented 4 years ago

This is a continuation of #93 . It supersedes #97. Addresses #91.

Now we have a single type for representing vectors in geometrical spaces, and it happens to be an alias for SVector (implementation detail). Additionally, we have type aliases for end users that often wish to materialize vectors with specific coordinate type and dimension. Here, we avoid partial instantiation. The aliases Vec2 and Vec3 are concrete vector types in 2D and 3D using double precision, and the aliases Vec2f and Vec3f are concrete vector types using single precision.

We deprecate unit in favor of euclidbasis, which is a more accurate name for the returned vectors. We add constant as a helper function to construct vectors with repeated coordinates. The name constant isn't ideal, please let me know if you have a better suggestion.

knuesel commented 4 years ago

I'd be happy to help with this refactoring, but I'm not quite clear what was the main problem with the vector types (and the discussion is spread out in other PRs). @juliohm could you expand a bit your summary with more context, maybe with a couple of before/after examples, to highlight the main goals? I think a complete list of all proposed changes (and their justifications) would be great to evaluate how much work it will take to adapt dependent packages. It would also be great to get feedback from the community (e.g. posting this "roadmap" to Discourse).

Here are some ideas/remarks regarding the names:

SimonDanisch commented 4 years ago

constant is definitely wrong. I'll try to find a better name...

fill is what base uses...We could scope it to apply to return Vecs/Points: Points.fill(0f0, 3)::Point3f This should even type infer with constant prop

SimonDanisch commented 4 years ago

Although I really feel, like nothing beats Vec3f(0) and I don't 100% get what would be confusing about it...

juliohm commented 4 years ago

Hi @knuesel , thanks for considering joining the effort. I am happy to elaborate on the plans, and share some of the motivations behind the refactoring of the code.

The first set of changes was initially discussed in #91. In that issue, you will find linked PRs that addressed it partially, including this one. The main problem before this set of changes is that Vectors and Points were being interchanged in the code incorrectly, and there were multiple vector types. By just sticking to const Vec = SVector as a single type for the concept of vectors in an embedding space, I was able to fix 3 bugs in GeometryBasics.jl alone. This is a big deal. We cannot move forward with these types of bugs. They are hard to track and a nightmare to fix.

The next PR in this set of changes will fix the Point type. Currently the Point type supports all kinds of operations that are only defined for vectors in the embedding space. We cannot add two points for example.

Regarding the Rectangle types, it will require another set of changes. The constructors are being abused, and all this fuss with single versus double precision is compromising the generality of the implementations. Rectangle are just a polygon with vertices. The type of the coordinates is naturally inherited from the vertices. So all we need is to have a single Rectangle type with the correct vertices upon construction. If we don't fix this pattern in the codebase, we will end up with FSphere, FLineString, FPyramid, ... and similar noise. The only place where one should specify the type of the coordinates is in the constructor of vectors or points. Everything else should follow.

Regarding your comments:

I prefer unit over euclidbasis: the name euclidbasis suggests that the return value is a basis, which is not the case. And the common name for these vectors is "standard basis vector" or "standard unit vector".

I can see that unit is a more elegant name. Maybe we can just call it unit as suggested and provide a clear docstring explaining that these are actually the versors of the Euclidean basis. I am aiming to support general manifolds in GeoStats.jl, so this helper function for constructing a global Euclidean basis is not always helpful. Ultimately, we would like to support multiple coordinate reference systems, and compute coordinates relatively as opposed to globally.

constant is definitely wrong. I'll try to find a better name...

Yes, I was really stuck on this one. Couldn't find a cleaner name. It should something that reminds us of the fact that we are repeating the same value for all the coordinates. I thought of using nval initially. Do you think it is a better option?

The plan is to change boundbox to boundingbox? That would be nice.

Sure, do you mind preparing a PR to the cleanup branch with this change? I was planning to address it after this more serious issue with vectors and points is solved.

What about renaming FRect->Rectf, FRect3D->Rect3f, etc.? It would be consistent with Vec3f, etc.

As I explained above, I have the opinion that we shouldn't be playing with specific precision all over the place. We should inherit the precision from the points or vectors like: Rect(rand(Vec3f), rand(Vec3f)) would construct what we call today FRect3D or something. Also, the name Rect isn't very good. Perhaps we should rename it to a name that translates the concept of a parallepiped: https://en.wikipedia.org/wiki/Parallelepiped This is important if one is considering transformations to the meshes that break the right angle assumption. I think we should keep a general parallepiped type, and add traits that check if the angles are 90 deg. Something like isrectangle(p::Parallepiped) = # check the vertices. We could even add a helper function that constructs parallepipeds satisfying the trait, and these would coincide with the current Rect assumption.

What do you think of this plan? I hope it makes sense to you all. I am doing my best to put the efforts here and work together, but if you feel that these changes aren't welcome, please let me know and I can continue working in a separate project.

juliohm commented 4 years ago

I really like the fill suggestion @SimonDanisch , just noticed it now after typing a long comment. Will update the PR accordingly.

juliohm commented 4 years ago

For consistency, and to distinguish with future point utility functions, I've renamed to vunit and vfill where "v" means that this is a vector utility function.

juliohm commented 4 years ago

I will merge this one by the end of the day in case there are no additional suggestions. 👍

juliohm commented 4 years ago

Although I really feel, like nothing beats Vec3f(0) and I don't 100% get what would be confusing about it...

It is mainly for consistency and clarity of intention. When we introduce vunit for unit vector and vfill for repeated coordinates, we signal maintainers about what we are trying to achieve very explicitly. Given that currently const Vec = SVector, we also save ourselves from reimplementing the full StaticArrays.jl behaviour manually, just for adding a new constructor with a single entry. The benefit is that things will work out of the box when devs from other packages plug their SVector vectors. There is no need to create a new vector type when StaticArrays.jl is so awesome, and keeps getting better.

knuesel commented 4 years ago

@juliohm thanks for the explanation, it's very helpful. Have consistent types, and keeping the implementation as generic as possible sound very good! However this is moving a bit fast for me (waiting less than a day for suggestions... no offense! but I can only work intermittently on this).

Anyway, here are my two cents 😊:

Regarding the type aliases for 2D, 3D, single and double precision: I think it's important to define these aliases because

I see no downside to these aliases (assuming we keep the GeometryBasics methods as general as possible as you say, so most methods should not use these specific aliases).

Good point on the parallelepipeds!

For the Euclidean basis: I think you actually mean "standard basis". It's the vector space that is (or isn't) Euclidean... I see a couple of results on Google for "euclidean basis" (well, more like 11'000) but AFAIK that's non-standard terminology. People could use Vec{5,Float64} for a non-Euclidean space, and [1,0,0,0,0] would still be the first vector of the standard basis.

I also like fill! But using different function names (vfill, pfill) seems unnecessary: fill for all types is more idiomatic and what's being filled is clear from the parameters...

juliohm commented 4 years ago

Thank you @knuesel , below are some follow up comments.

Regarding the type aliases for 2D, 3D, single and double precision: I think it's important to define these aliases because

  • They make the package more user-friendly (probably most users prefer to write Rect2f than Rect{2,Float32}).

Users won't write Rect{2,Float32}, they will create rectangles (or paralellepid) with actual points, so it will read like Rect(Point2f(...), Point2f(...)).

  • They make the user code more readable.

As pointed out, code becomes less readable with multiple aliases for each geometry. We are talking Rectangle here, imagine this pattern for all primitives. Take a look at the rectangle source code as it is currently, and you will realize the huge number of constructors just to provide these aliases: https://github.com/JuliaGeometry/GeometryBasics.jl/blob/4bc7ebcad7dafae1392ef2fabc3405fcd7c78033/src/primitives/rectangles.jl#L30-L156

The plan is to cleanup this source of variation in a future PR, after #101 is ready. It is been a good exercise to fix these issues. I could already solve at least a couple dozen bugs in this last PR.

  • They prevent unnecessary fragmentation in dependent packages: if packages that use GeometryBasics have to define their own aliases for convenience, the ecosystem will end up with different names for the same thing, and it won't be obvious that XXX.Rect3f and YYY.FRect3D are actually the same type!

As I mentioned, there is no need for aliases after we get the point type aliases.

For the Euclidean basis: I think you actually mean "standard basis". It's the vector space that is (or isn't) Euclidean... I see a couple of results on Google for "euclidean basis" (well, more like 11'000) but AFAIK that's non-standard terminology. People could use Vec{5,Float64} for a non-Euclidean space, and [1,0,0,0,0] would still be the first vector of the standard basis.

"Standard" is a function of the research community you are in. This basis may not be standard in other contexts. The better name for it is Cartesian. It is Euclidean because every point is obtained by simple addition and subtraction of coordinates, and it is Cartesian because of the orthonormality property.

I also like fill! But using different function names (vfill, pfill) seems unnecessary: fill for all types is more idiomatic and what's being filled is clear from the parameters...

Introducing new names is a good practice in general. After writing a lot of code in the GeoStats.jl stack and in Julia, I realized that life is much easier when we have our own names for a new functionality. The language doesn't provide a trait system and so extending behavior with Base names is impossible without macro hacks. The v prefix will turn out useful when we start having equivalents for the point type with a p prefix. It also helps maintainers to write a coherent API, and to read code later on.