Closed juliohm closed 4 years ago
I'd be happy to help with this refactoring, but I'm not quite clear what was the main problem with the vector types (and the discussion is spread out in other PRs). @juliohm could you expand a bit your summary with more context, maybe with a couple of before/after examples, to highlight the main goals? I think a complete list of all proposed changes (and their justifications) would be great to evaluate how much work it will take to adapt dependent packages. It would also be great to get feedback from the community (e.g. posting this "roadmap" to Discourse).
Here are some ideas/remarks regarding the names:
unit
over euclidbasis
: the name euclidbasis
suggests that the return value is a basis, which is not the case. And the common name for these vectors is "standard basis vector" or "standard unit vector".constant
is definitely wrong. I'll try to find a better name...boundbox
to boundingbox
? That would be nice.FRect
->Rectf
, FRect3D
->Rect3f
, etc.? It would be consistent with Vec3f
, etc.constant is definitely wrong. I'll try to find a better name...
fill is what base uses...We could scope it to apply to return Vecs/Points:
Points.fill(0f0, 3)::Point3f
This should even type infer with constant prop
Although I really feel, like nothing beats Vec3f(0)
and I don't 100% get what would be confusing about it...
Hi @knuesel , thanks for considering joining the effort. I am happy to elaborate on the plans, and share some of the motivations behind the refactoring of the code.
The first set of changes was initially discussed in #91. In that issue, you will find linked PRs that addressed it partially, including this one. The main problem before this set of changes is that Vectors and Points were being interchanged in the code incorrectly, and there were multiple vector types. By just sticking to const Vec = SVector
as a single type for the concept of vectors in an embedding space, I was able to fix 3 bugs in GeometryBasics.jl alone. This is a big deal. We cannot move forward with these types of bugs. They are hard to track and a nightmare to fix.
The next PR in this set of changes will fix the Point type. Currently the Point type supports all kinds of operations that are only defined for vectors in the embedding space. We cannot add two points for example.
Regarding the Rectangle types, it will require another set of changes. The constructors are being abused, and all this fuss with single versus double precision is compromising the generality of the implementations. Rectangle are just a polygon with vertices. The type of the coordinates is naturally inherited from the vertices. So all we need is to have a single Rectangle type with the correct vertices upon construction. If we don't fix this pattern in the codebase, we will end up with FSphere, FLineString, FPyramid, ... and similar noise. The only place where one should specify the type of the coordinates is in the constructor of vectors or points. Everything else should follow.
Regarding your comments:
I prefer unit over euclidbasis: the name euclidbasis suggests that the return value is a basis, which is not the case. And the common name for these vectors is "standard basis vector" or "standard unit vector".
I can see that unit
is a more elegant name. Maybe we can just call it unit
as suggested and provide a clear docstring explaining that these are actually the versors of the Euclidean basis. I am aiming to support general manifolds in GeoStats.jl, so this helper function for constructing a global Euclidean basis is not always helpful. Ultimately, we would like to support multiple coordinate reference systems, and compute coordinates relatively as opposed to globally.
constant is definitely wrong. I'll try to find a better name...
Yes, I was really stuck on this one. Couldn't find a cleaner name. It should something that reminds us of the fact that we are repeating the same value for all the coordinates. I thought of using nval
initially. Do you think it is a better option?
The plan is to change boundbox to boundingbox? That would be nice.
Sure, do you mind preparing a PR to the cleanup
branch with this change? I was planning to address it after this more serious issue with vectors and points is solved.
What about renaming FRect->Rectf, FRect3D->Rect3f, etc.? It would be consistent with Vec3f, etc.
As I explained above, I have the opinion that we shouldn't be playing with specific precision all over the place. We should inherit the precision from the points or vectors like: Rect(rand(Vec3f), rand(Vec3f))
would construct what we call today FRect3D
or something. Also, the name Rect
isn't very good. Perhaps we should rename it to a name that translates the concept of a parallepiped: https://en.wikipedia.org/wiki/Parallelepiped This is important if one is considering transformations to the meshes that break the right angle assumption. I think we should keep a general parallepiped type, and add traits that check if the angles are 90 deg. Something like isrectangle(p::Parallepiped) = # check the vertices
. We could even add a helper function that constructs parallepipeds satisfying the trait, and these would coincide with the current Rect
assumption.
What do you think of this plan? I hope it makes sense to you all. I am doing my best to put the efforts here and work together, but if you feel that these changes aren't welcome, please let me know and I can continue working in a separate project.
I really like the fill
suggestion @SimonDanisch , just noticed it now after typing a long comment. Will update the PR accordingly.
For consistency, and to distinguish with future point utility functions, I've renamed to vunit
and vfill
where "v" means that this is a vector utility function.
I will merge this one by the end of the day in case there are no additional suggestions. 👍
Although I really feel, like nothing beats
Vec3f(0)
and I don't 100% get what would be confusing about it...
It is mainly for consistency and clarity of intention. When we introduce vunit
for unit vector and vfill
for repeated coordinates, we signal maintainers about what we are trying to achieve very explicitly. Given that currently const Vec = SVector
, we also save ourselves from reimplementing the full StaticArrays.jl behaviour manually, just for adding a new constructor with a single entry. The benefit is that things will work out of the box when devs from other packages plug their SVector
vectors. There is no need to create a new vector type when StaticArrays.jl is so awesome, and keeps getting better.
@juliohm thanks for the explanation, it's very helpful. Have consistent types, and keeping the implementation as generic as possible sound very good! However this is moving a bit fast for me (waiting less than a day for suggestions... no offense! but I can only work intermittently on this).
Anyway, here are my two cents 😊:
Regarding the type aliases for 2D, 3D, single and double precision: I think it's important to define these aliases because
Rect2f
than Rect{2,Float32}
).GeometryBasics
have to define their own aliases for convenience, the ecosystem will end up with different names for the same thing, and it won't be obvious that XXX.Rect3f
and YYY.FRect3D
are actually the same type!I see no downside to these aliases (assuming we keep the GeometryBasics
methods as general as possible as you say, so most methods should not use these specific aliases).
Good point on the parallelepipeds!
For the Euclidean basis: I think you actually mean "standard basis". It's the vector space that is (or isn't) Euclidean... I see a couple of results on Google for "euclidean basis" (well, more like 11'000) but AFAIK that's non-standard terminology. People could use Vec{5,Float64}
for a non-Euclidean space, and [1,0,0,0,0]
would still be the first vector of the standard basis.
I also like fill
! But using different function names (vfill
, pfill
) seems unnecessary: fill
for all types is more idiomatic and what's being filled is clear from the parameters...
Thank you @knuesel , below are some follow up comments.
Regarding the type aliases for 2D, 3D, single and double precision: I think it's important to define these aliases because
- They make the package more user-friendly (probably most users prefer to write
Rect2f
thanRect{2,Float32}
).
Users won't write Rect{2,Float32}
, they will create rectangles (or paralellepid) with actual points, so it will read like Rect(Point2f(...), Point2f(...))
.
- They make the user code more readable.
As pointed out, code becomes less readable with multiple aliases for each geometry. We are talking Rectangle here, imagine this pattern for all primitives. Take a look at the rectangle source code as it is currently, and you will realize the huge number of constructors just to provide these aliases: https://github.com/JuliaGeometry/GeometryBasics.jl/blob/4bc7ebcad7dafae1392ef2fabc3405fcd7c78033/src/primitives/rectangles.jl#L30-L156
The plan is to cleanup this source of variation in a future PR, after #101 is ready. It is been a good exercise to fix these issues. I could already solve at least a couple dozen bugs in this last PR.
- They prevent unnecessary fragmentation in dependent packages: if packages that use
GeometryBasics
have to define their own aliases for convenience, the ecosystem will end up with different names for the same thing, and it won't be obvious thatXXX.Rect3f
andYYY.FRect3D
are actually the same type!
As I mentioned, there is no need for aliases after we get the point type aliases.
For the Euclidean basis: I think you actually mean "standard basis". It's the vector space that is (or isn't) Euclidean... I see a couple of results on Google for "euclidean basis" (well, more like 11'000) but AFAIK that's non-standard terminology. People could use Vec{5,Float64} for a non-Euclidean space, and [1,0,0,0,0] would still be the first vector of the standard basis.
"Standard" is a function of the research community you are in. This basis may not be standard in other contexts. The better name for it is Cartesian. It is Euclidean because every point is obtained by simple addition and subtraction of coordinates, and it is Cartesian because of the orthonormality property.
I also like fill! But using different function names (vfill, pfill) seems unnecessary: fill for all types is more idiomatic and what's being filled is clear from the parameters...
Introducing new names is a good practice in general. After writing a lot of code in the GeoStats.jl stack and in Julia, I realized that life is much easier when we have our own names for a new functionality. The language doesn't provide a trait system and so extending behavior with Base
names is impossible without macro hacks. The v
prefix will turn out useful when we start having equivalents for the point type with a p
prefix. It also helps maintainers to write a coherent API, and to read code later on.
This is a continuation of #93 . It supersedes #97. Addresses #91.
Now we have a single type for representing vectors in geometrical spaces, and it happens to be an alias for
SVector
(implementation detail). Additionally, we have type aliases for end users that often wish to materialize vectors with specific coordinate type and dimension. Here, we avoid partial instantiation. The aliasesVec2
andVec3
are concrete vector types in 2D and 3D using double precision, and the aliasesVec2f
andVec3f
are concrete vector types using single precision.We deprecate
unit
in favor ofeuclidbasis
, which is a more accurate name for the returned vectors. We addconstant
as a helper function to construct vectors with repeated coordinates. The nameconstant
isn't ideal, please let me know if you have a better suggestion.