JuliaMolSim / AtomsBase.jl

A Julian abstract interface for atomic structures.
https://juliamolsim.github.io/AtomsBase.jl/
MIT License
81 stars 16 forks source link

Should we enforce Unitful coordinates? #1

Open Gregstrq opened 2 years ago

Gregstrq commented 2 years ago

This is the place to discuss whether we want to enforce specific types for the coordinates and, by extension, for the velocities.

Right now, the proposed way is to assume that the coordinates must have a unitful type with dimension of the length, while the velocities must have a unitful type with the corresponding dimension of the velocity. As a consequence, all of the realizations of the abstract interface need to enforce this assumption.

What is the opinion of the community members about this restriction?

My take on it is that this is an unnecessary restriction. Although I agree that the majority of use cases pertain to the particles in real space, I can imagine doing dynamical simulations with the particles living in some abstract space. For example, in momentum space. As a result, I propose to relax the restriction and allow the coordinates and velocities to have arbitrary types. Instead, I propose that we should enforce the "homogeneity" of the used types: one should not be able to mix the unitless and unitful coordinates while describing the same system (one should not be able to use unitful quantities with different dimensions as well).

eahenle commented 2 years ago

In the case of crystalline systems, coordinates are often fractional, so I think allowing unitless is a good thing.

rkurchin commented 2 years ago

This is absolutely true, but our thought was that even in the case of fractional coordinates, there is a set of basis vectors to which those fractions refer, and those vectors have units. For the purposes of interoperability that we hope to facilitate by this package (e.g. passing between MD/DFT/ML, or doing file I/O, etc.), I think it makes sense that this function would return unitful Cartesian coordinates in that case.

(Of course, an individual implementation of the interface could have other functions for passing around fractional coordinates internally, and would almost certainly need to have them!)

Does this make sense?

jgreener64 commented 2 years ago

Initially I thought no units for the reasons given above. Now I have switched to the other view, for the reasons Rachel outlines.

We should keep in mind what an interface is intended to do: to provide functions with guarantees to downstream users. It doesn't stop people writing their own functions to return positions in momentum space or fractional coordinates. It just says that if you want to implement the interface, you need to have some function that converts to Cartesian space to help downstream users. There are a limited number of cases where this may be impossible (e.g. particles in momentum space), but the interface can't support everything and I feel Cartesian coordinates are important enough to be a first class citizen.

A couple of use cases where relaxing unit requirements would make the interface less powerful:

Gregstrq commented 2 years ago
jgreener64 commented 2 years ago

You are right that you can get round it with dispatch, but the point of the interface is to provide users with simple functions that have guarantees. If users have to implement one version of a function and throw a MethodError for others it would seem to harm interoperability, as their code would no longer work on everything that implements the interface.

It seems the underlying question here is whether absolute position in 3D space is a central enough concept to atomic systems that it should be part of the interface. Personally I think it is, but I can see that there are particular situations where it is not required.

Gregstrq commented 2 years ago

Well, not everyone always deals with 3D space. It is oftentimes useful to consider the space with reduced dimensionality as well, i.e., 2D or 3D space. At least this is the case in my field.

Regarding MethodError and interoperability. If your coordinates have units other than length and you are trying to save the data in the file designed for crystals, then you are probably doing something wrong. If you are using coordinates with units other than length, you are probably aware that you are doing something out of the box and some things would not work because they were not designed for your crazy thing in the first place.

Now, what about using unitless coordinates? Adding units or stripping them away is a pretty trivial thing. I think the guys dealing with the fast numerical implementations might want to have a function that strips the units anyway. So, we can actually add to the interface two functions: one would take a system with unitful coordinates and return an equivalent system with unitless coordinates; the other would transform in the opposite direction.

jgreener64 commented 2 years ago

Yes I misspoke about 3D space, I meant any dimensional space as long as the dimensions are length. That works in the current system and personally I think that it gives enough flexibility for most cases.

Regarding unit adding/stripping, as you say it is pretty easy so people can just use ustrip. I am hesitant to provide a function to add units to unitless quantities - that should be done with care and explicitly by the user since it assumes the input data is in a certain unit.

Gregstrq commented 2 years ago

In the case of adding units to unitless quantities the user of course needs to specify what units does he want to add. In the absence of velocities it may look like

add_units(system::AbstractSystem{D,T}, ::Type{T'})::AbstractSystem{D,T'}
rashidrafeek commented 2 years ago

The current implementation doesn't allow creating a system with coordinates which are not Unitful. The issue with doing this is that it would be very hard to work with the rest of the Julia ecosystem where Unitful compatibility is not always assured. As an example, I will point out this PR: https://github.com/JuliaStats/Distances.jl/pull/229#issue-1024837367 I submitted while using its pairwise() function to compute the "distance matrix" of a SimpleSystem by giving Uniful coordinates as input. While it was easy to fix this issue by submitting a pull request to it, this might not always be the case.

While I understand that having Unitful coordinates would be helpful for having a default interface for package developers using this package for various MD/DFT packages, enforcing them would be annoying for the users of this package due to less compatibility with many other packages in the Julia ecosystem which don't take care of Unitful coordinates. Having compatibility with these packages would help in the analysis of simulations or in extracting useful quantities from structures.

So I suggest that instead of enforcing them, a better way would be to have all the exported functions be Unitful by default but also allow the use of non Unitful coordinates when needed, instead of disallowing it completely. I have submitted a PR (#16) in line with this, which can be merged if decided in favor of this (or not).

jgreener64 commented 2 years ago

The issue with doing this is that it would be very hard to work with the rest of the Julia ecosystem where Unitful compatibility is not always assured.

But can't people run ustrip on the output of a function before putting it into packages that are not Unitful compatible? I appreciate this may mean taking on Unitful as a direct dependency but to me that is a price worth paying to be able to guarantee that position and velocity return things with built-in physical meaning.

It's good to hear more input on this topic, and hopefully other people will weigh in once we publicise the repo more.

rashidrafeek commented 2 years ago

But can't people run ustrip on the output of a function before putting it into packages that are not Unitful compatible?

Yes. With the downside of having to import Unitful for any small operation, as you say.

to be able to guarantee that position and velocity return things with built-in physical meaning.

If this is the case, It might be useful to export convenient functions to deal with units ourselves. Or even better, something like @reexport using Unitful using https://github.com/simonster/Reexport.jl. Just a suggestion.

jgreener64 commented 2 years ago

If this is the case, It might be useful to export convenient functions to deal with units ourselves. Or even better, something like @reexport using Unitful using https://github.com/simonster/Reexport.jl.

Yes this is a good idea, even if we only re-export ustrip that gives people something useful.

Kolaru commented 2 years ago

But can't people run ustrip on the output of a function before putting it into packages that are not Unitful compatible?

In the past I did it and realized that there can be a lot of ustripping to do (while using MultivariateStats.jl for some analysis for example), and it even quickly became the default for me to always strip units, just to be sure to have the code run first time. This cluttered the code quite a bit.

What I ended up doing was to store the units as property of the system object, doing all conversion during construction and never having to deal with Unitful.Quantity internally. Then for e.g. plotting I just added some convenience function to retrieve the units.

At the end of the day, it looks like a tradeoff between being sure units are propagated correctly and direct composability with the ecosystem.

mfherbst commented 2 years ago

Internally (i.e. in your implementation of the interface) you can still do this, so I don't see a restriction in the current interface.

However, one might argue it might sense to have a second set of function that are guaranteed to return unitless quantities. We discussed this at some point and left the decision on this for later if I remember correctly.

cortner commented 2 months ago

I would like to revive the idea to allow working unitless. I think this is a strange and unnecessary restriction that will put off potential users. Looking through the code, I think it can be done without breaking backward compatibility.

I think both working with and without units has advantages and disadvantages and we should not impose our own style on others.

rkurchin commented 2 months ago

I personally still lean towards having the "default" exposed interface functions require units to avoid any unpleasant surprises in "naïve" interoperability cases, but I can imagine that there would be cases where carrying units around could impede performance, so I'd be open to defining another set of functions (or keywords to existing ones, though that might cause some type stability issues) to get non-Unitful quantities. It would be easy enough to define default dispatches of these via @ustrip and obviously any implementation could dispatch them differently if they want to nondimensionalize differently...

cortner commented 2 months ago

I don't think it's really about performance. And I don't think there need to be different interface functions. All it needs is that definitions like

struct Atom{D, L<:Unitful.Length, V<:Unitful.Velocity, M<:Unitful.Mass}
end 

are replaced with

struct Atom{D, L, V, M}
end 

There is simply no reason to be restrictive in the type signature I think.

mfherbst commented 2 months ago

Just that I understand correctly: You would still return Unitful quantities when a user calls position, right ? Because at least in the current state of AtomsBase specifying the output unit is part of the interface definition. We can discuss this, but as Rachel said I think this avoids many issues.

I agree though, that what you do internally should not be restricted and enforcing the unit in the currently existing output functions can be realised by tests instead of type restrictions.

On top, as said before, I agree it might be a good idea to have an additional mechanics for data in atomic units without Unitful (both for constructing objects as well as for extracting data), then a call like position could just stick on a u"bohr" automatically.

cortner commented 2 months ago

No. I would not return unitful quantities, that's the entire point. And I am strongly opposed (for now) to introducing new interface functions that do the ustrip thing.

The point of this proposal is to allow positions, energies, forces, virials, masses, etc be stored as either plain old floating point numbers (or indeed any number type!) OR to be unitful and to make the entire interface and implementation agnostic to the choice.

jgreener64 commented 2 months ago

The power of the interface comes from what is expected of the return values of its functions. If position can return a unitful or unitless quantity, then every single package that uses the AtomsBase interface will have to have something like

coord = position(sys, 1)
if unit(coord) == NoUnits
    # Do something
else
    # Do something else
end

which seems like a problem.

As I understand it, the main benefit of the AtomsBase interface is that other packages can write code that works on all AtomsBase systems. You can still work unitless with your own AtomsBase-compatible structs, you just need to write position, atomic_mass etc. functions that add a unit.

cortner commented 2 months ago

I disagree with that. I don't see the problem. If your package requires units then it will simply fail with an error message if a unitless quantity is returned.

rashidrafeek commented 2 months ago

Currently, the docs of position say that: "Return type should be a vector of vectors each containing D elements that are <:Unitful.Length." So a user of position would have written code assuming that this condition is met. For example, it is expected that adding a quantity of length dimension with a quantity of time dimension will throw a DimensionError. And if we change this to return unitless quantities, the same code would then silently return wrong results. I know this is a worst case. But having the guarantee that these functions will always return Unitful quantities is very useful while carrying out some post processing or analyzing data using AtomsBase.

Instead, it would be better to have a different set of functions, which is always guaranteed to return values without units. It would also be useful if the values returned have a default unit, for example, Å for length. Then, defining, these functions would be as easy as,

position_ul(system) = map(position(system)) do pos
    ustrip.(u"Å", pos)
end

Similar version can be there for other functions, for example, velocity_ul, bounding_box_ul etc. The suffix "ul" (meaning unitless) is just a suggestion.

cortner commented 2 months ago

I'm sorry but I don't get it. That user should simply use units in their workflow.