Closed jaclark5 closed 2 years ago
An example of how to resolve this issue is shown here: https://github.com/jaclark5/mdanalysis/tree/empty_atomgroup
Would mind making a PR? Thanks.
I added a draft of a pull request, is that appropriate? The way I read the contributors page suggested that I need to have an issue with a conversation before making Pull Requests.
Thanks for the draft PR, that's great!
The way I read the contributors page suggested that I need to have an issue with a conversation before making Pull Requests.
I don't know if you're referring to https://userguide.mdanalysis.org/stable/contributing_code.html#contributing-to-the-main-codebase :
If your contribution is major, such as a bug fix or a new feature, start by opening an issue first. That way, other people can weigh in on the discussion before you do any work. If you also create a pull request, you should link it to the issue by including the issue number in the pull request’s description.
This is supposed to mean that we really appreciate any and all contributions but that we sometimes have a good idea if something fits into the core library or not or if a particular issue has a non-obvious solution, so we don't want you to waste your time doing a lot of work in advance. But at the end of the day, a PR is a clear way to show how you think a problem can be solved so it's always welcome.
I looked at the draft PR #3839 . At the moment, center_of_mass()
and center_of_charge()
return empty arrays of shape (0, 3) when applied to an empty AtomGroup. My initial reaction was that this seems to be reasonable behavior, even though it's not documented.
However, these empty arrays are somewhat peculiar objects. Let's make one:
import MDAnalysis as mda; from MDAnalysisTests import datafiles as data
u = mda.Universe(data.TPR, data.XTC)
empty = u.atoms[:0] # empty AtomGroup
x = u.atoms[:5].positions
e = empty.center_of_mass()
where x
and e
are
In [28]: x
Out[28]:
array([[52.02 , 43.560005, 31.550003],
[51.190002, 44.11 , 31.720001],
[51.550003, 42.83 , 31.04 ],
[52.47 , 43.170006, 32.370003],
[53.07 , 44.21 , 30.75 ]], dtype=float32)
In [29]: e
Out[29]: array([], shape=(0, 3), dtype=float32)
What's weird is how e
behaves: It does not broadcast:
In [30]: x + e
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [30], line 1
----> 1 x + e
ValueError: operands could not be broadcast together with shapes (5,3) (0,3)
whereas x + x[0]
works (x[0].shape == (1, 3)
).
But using this shaped, literally empty array in any kind of operations creates another empty one
In [32]: x[0] + e
Out[32]: array([], shape=(0, 3), dtype=float32)
In [33]: e * x[0]
Out[33]: array([], shape=(0, 3), dtype=float32)
In [34]: 42 * e
Out[34]: array([], shape=(0, 3), dtype=float32)
It's kind of NaN value.
It's a fair question if we want the existing behavior or if we want to replace it, e.g., with raising an exception. More broadly, what should be the behavior of a method of an AtomGroup if the method cannot provide a sensible answer?
Opinions welcome, @MDAnalysis/coredevs .
I would argue an exception is the right behaviour.
I am also tending towards exception.
Will we consider switching to an exception a fix (then it can go into release 2.4) or is it a behavior change (then it might have to wait for 3.0). Any opinions, especially @IAlibay ?
Sorry there's a lot of threads here and it's hard to see what the impact is. May I ask for a bit of a ELI5 on which bits of the codes are we looking to specifically change?
Wait which function are we talking about? The original issue is that the anisotropy methods fail (and soon multipole moments ;) )with an obscure error when the group is empty. Couldn't I add an exception there? Originally I suggested a change to centerof* because they were referenced in all those other features.
I had to look up ELI5... The question that @hmacdope and I have been moving towards extends beyond this particular issue: What should our AtomGroups do when a calculation is requested of an empty AtomGroup?
This question is a bit of an edge case because normally AtomGroups only have methods attached that are suitable for the attributes held by the group and so you typically don't run into a situation where a calculation will become non-sensical (incorrectly guessed masses being a glaring exception for things like center_of_mass()
etc). This issue showed that it can happen that as part of normal calculations, empty AtomGroups are generated and they return weird numpy stuff (https://github.com/MDAnalysis/mdanalysis/issues/3837#issuecomment-1253033215), namely shaped empty-set arrays that function as NaNs in the sense that they turn all calculations that they participate in also into these weird arrays (maybe @tylerjereddy can say more about these arrays??). We have been moving towards the following view :
ValueError
(or maybe exceptions.NoDataError
https://github.com/MDAnalysis/mdanalysis/blob/fa8b03b210827e288bcdf5c77898f6a4faca1385/package/MDAnalysis/exceptions.py#L35 which is derived from ValueError
(and AttributeError
)?).One can make the case that the mass or charge of an empty group should be 0. But I have no idea what the center of mass or centroid of an empty group should be — certainly, [0, 0, 0]
is not the correct answer because for certain molecules it could just be that. In order to be consistent (and following the Python Zen "fail early and often"), I'd suggest that any calculation on an empty AtomGroup raises an exception.
I voted for an exception or NaN in the Discord poll, because that seems more consistent with numpy:
>>> import numpy as np
>>> np.mean([])
/Users/lily/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/Users/lily/anaconda3/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
nan
We had a discussion and a straw-poll in the public #developer channel on Discord. Here's the outcome of the discussion:
My fear is that with NaN we get more obscure error reports that we have to help track down to an empty AtomGroup somewhere, thus creating more support burden. Therefore, I am very much inclined to go with raising an exception. (I'd be following Errors should never pass silently. (from the Zen of Python) and "fail early" (a programming principle dating back to at least 2004).)
Based on the characterization of ValueError
Raised when an operation or function receives an argument that has the right type but an inappropriate value [...]
I'd say we raise ValueError
.
I was actually thinking NoDataError would make more sense? If it's a method then you're error-ing on a lack of data not the fact that it's the wrong value (because technically there are no values?)
Re: API break, I can live with this being a 2.4.0 change, but I would ask that we be very loud about it.
I always thought NoDataError was more like missing topology attribute, rather than zero sized input.
On Fri, Sep 23, 2022 at 19:53, Irfan Alibay @.***> wrote:
I was actually thinking NoDataError would make more sense? If it's a method then you're error-ing on a lack of data not the fact that it's the wrong value (because technically there are no values?)
Re: API break, I can live with this being a 2.4.0 change, but I would ask that we be very loud about it.
— Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/3837#issuecomment-1256557676, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB2TRR26LECUIG4NZN3V7X4AZANCNFSM6AAAAAAQQSXAOE . You are receiving this because you are on a team that was mentioned.Message ID: @.***>
I always thought NoDataError was more like missing topology attribute, rather than zero sized input.
On Fri, Sep 23, 2022 at 19:53, Irfan Alibay @.***> wrote:
I was actually thinking NoDataError would make more sense? If it's a method then you're error-ing on a lack of data not the fact that it's the wrong value (because technically there are no values?)
Re: API break, I can live with this being a 2.4.0 change, but I would ask that we be very loud about it.
— Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/3837#issuecomment-1256557676, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB2TRR26LECUIG4NZN3V7X4AZANCNFSM6AAAAAAQQSXAOE . You are receiving this because you are on a team that was mentioned.Message ID: @.***>
It's a bit of an academic discussion for not a big deal either way tbh, but that falls within my understanding of the chain of failure.
Empty atomgroup -> no masses -> can't use masses
The attribute is still there but only by virtue that our default is that we default to nothing instead of raising an error when there's nothing?
Similar to https://github.com/MDAnalysis/mdanalysis/issues/3837#issuecomment-1256579470 I'd also seen NoDataError
more like to missing something from the topology. However, after a quick git grep NoDataError
it looks as if we are using it quite liberally to mean "you haven't done something that would provide data that I need" or "this timestep doesn't have what I need right now (for whatever reasons)". Based on the current usage practice, I'd also be ok with NoDataError
.
NoDataError
is derived from ValueError
(and AttributeError
) so we can treat it as ValueError
when catching exceptions. If we start out with ValueError
then it's not a problem to make it a NoDataError
later because code will still work. Turning a NoDataError
into ValueError
is a breaking change, though.
(RuntimeError
is also a possibility but I'd prefer one of the other, more specific errors.)
Do we want NoDataError
instead of ValueError
?
Expected behavior
When calculating the shape_parameter of an AtomGroup, provide error if the group is empty. This is best placed in
center_of_mass
andcenter_of_charge
since all other functions reference these two methods first.Actual behavior
Obscure error that can eventually be determined to result from an empty AtomGroup:
numpy.linalg.LinAlgError: Eigenvalues did not converge
Code to reproduce the behavior
Current version of MDAnalysis