Amber-MD / pytraj

Python interface of cpptraj
https://amber-md.github.io/pytraj
170 stars 38 forks source link

TrajectoryIterator: boolean index arraylike does not work #1512

Closed pindakaas42 closed 4 years ago

pindakaas42 commented 4 years ago

Giving an array (or a list) with boolean data as an index to a TrajectoryIterator object does not work. It just returns a Trajectory object with all the frames, regardless of the content of the index array.

Passing an array with boolean dtype as index to a Trajectory object does work though and it only returns the frames that correspond to True in the index array, so I think this is a bug.

pindakaas42 commented 4 years ago

if any more info is needed feel free to ask

hainm commented 4 years ago

Hi @pindakaas42, sorry for late feedback. I am not sure it's a good idea to support boolean indexing. Can you please provide a use case for it? thanks.

pindakaas42 commented 4 years ago

You are already supporting it for Trajectory objects.

Use case is my daily work, here is one example: I am looking at a trajectory of a dna, the backbone angles take certain conformations, that I would like to filter for, so i compute the dihedrals with pt.dihedral, the result is a numpy array so I do :filter1 = result[result <50] for all these dihedrals to get a bool array with the frames that fullfill this condition, now doing different combinations of filtering is easy: combined_filter = (filter1 | filter2 ) & ~filter4 and so on will give me different frames with different content very quickly, if I pass them as an index.

I think you can see why this is useful. But the use case is not really relevant as supporting it for Trajectory and not TrajectoryIterator would be very inconsistent, especially since I want to write my scripts to be agnostic about which kind of trajectory I provide to them.

hainm commented 4 years ago

I think supporting boolean is side effect and it was not in my design. Thanks for example.

On Wed, Feb 5, 2020 at 4:34 AM pindakaas42 notifications@github.com wrote:

You are already supporting it for Trajectory objects.

Use case is my daily work, here is one example: I am looking at a trajectory of a dna, the backbone angles take certain conformations, that I would like to filter for, so i compute the dihedrals with pt.dihedral, the result is a numpy array so I do :filtered_result = result[result <50] for all these dihedrals, now doing different combinations of filtering is easy: (res1 | res2 ) & ~ res4 and so on will give me different frames with different content very quickly.

I think you can see why this is useful. But the use case is not really relevant as supporting it for Trajectory and not TrajectoryIterator would be very inconsistent, especially since I want to write my scripts to be agnostic about which kind of trajectory I provide to them.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Amber-MD/pytraj/issues/1512?email_source=notifications&email_token=ABB645IEB5O4GJHNXQPMV7DRBKB25A5CNFSM4KNCUB3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK2X6OI#issuecomment-582319929, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB645KCDSDMTSRN3XSSR6LRBKB25ANCNFSM4KNCUB3A .

pindakaas42 commented 4 years ago

Maybe also this error message is something you'd want to reconsider,: pytraj-2.0.5-py2.7-linux-x86_64.egg/pytraj/trajectory/trajectory.pyc in __getitem__(self, index) 303 """ 304 if self.n_frames == 0: --> 305 raise IndexError("Your Trajectory is empty, how can I index it?") 306 307 if is_int(index): in numpy for example if you create an array that is empty like a =np.arange(0,0) and then index it a[Fasle] we get, as expected again an empty array in return, I think this makes the resulting interface a lot more consistent. At the moment I have to catch empty trajectories and just not give an index to them to word around this.

hainm commented 4 years ago

thanks for your suggestion.

hainm commented 4 years ago

[x] TrajectoryIterator: boolean index arraylike does not work -> Fixed in #1551

In [6]: traj = pt.iterload('tz2.nc', 'tz2.parm7')                                                                                                   

In [7]: traj[[True]*100 + [False]]                                                                                                                  
Out[7]: 
pytraj.Trajectory, 100 frames: 
Size: 0.000498 (GB)
<Topology: 223 atoms, 13 residues, 1 mols, non-PBC>

In [8]: traj                                                                                                                                        
Out[8]: 
pytraj.TrajectoryIterator, 101 frames: 
Size: 0.000503 (GB)
<Topology: 223 atoms, 13 residues, 1 mols, non-PBC>      
hainm commented 4 years ago

At the moment I have to catch empty trajectories and just not give an index to them to word around this.

@pindakaas42 What's your usage?

pindakaas42 commented 4 years ago

Use cases are many, one example: I have a series of trajectories, that I want to filter for a certain state to be present, and then save them. Sometimes some of these need to be empty because the state is not there, this is easy to do if the interface is consistent (like the numpy behaviour) and it gives an empty trajectory if the boolean array is all False. If I then apply another filter it will cause a problem if I can not index the empty trajectory. At that point I now can not continue filtering and have to write code that takes out the empty trajectories before continuing...
(I hope this makes sense...)

pindakaas42 commented 4 years ago

sorry miss clicked

hainm commented 4 years ago

Do you have any code around so I can look?

hainm commented 4 years ago

uhm, so do you mean that you prefer to have this work?

t[[False]*t.n_frames][False]
hainm commented 4 years ago

a =np.arange(0,0) and then index it a[Fasle] we get, as expected again an empty array in return, I think this makes the resulting interface a lot more consistent. At the moment I have to catch empty trajectories and just not give an index to them to word around this.

Uhm, I am confused here. We do not want to support boolean indexing like that (if that works, it's a bug), current code only support an array of bool values.

For an array

In [17]: a =np.arange(0,0)                                                                                                                          

In [18]: a[[False]]                                                                                                                                 
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-18-58cd5c4ff956> in <module>
----> 1 a[[False]]

IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 1

By the way, I need to see the real code because I have trouble to understand your idea.

pindakaas42 commented 4 years ago

Hey, I think giving you an actual excerpt of my code is too much work for me at the moment, as I would have to cut it down to be understandable, I don't want to post 400 plus lines of bare mostly uncommented code here. Just look at the source code, you are probably using numpy arrays for the coordinates that is why you involuntarily do support boolean indexing, here are lines from pytraj/trajectory/trajectory.py:

341             elif not isinstance(index, tuple):
342                 # might return a view or a copy
343                 # based on numpy array rule
344                 # traj.xyz[index]
345                 traj.top = self.top
346                 traj._xyz = self._xyz[index]

The index is simply passed to _xyz, which is a probably a numpy array correct? This is how you inherit all this numpy stuff like boolean indexes and the ability to index empty trajectories.

In line 304 to 305 this exception messes with the normal behavior that would result from passing a boolean array to an empty numpy array.

304         if self.n_frames == 0:
305             raise IndexError("Your Trajectory is empty, how can I index it?")

just removing this would help.

I can write an actual example use code for you, but unless you want to fundamentally change how this class works you could as well just remove the above exception. I may have more time later if you need a better explanation.

pindakaas42 commented 4 years ago

uhm, so do you mean that you prefer to have this work?

t[[False]*t.n_frames][False]

Yes I would prefer to have this work, as you see from my post above, pytraj does extra work to break this well defined behavior it inherited from numpy.

hainm commented 4 years ago

In line 304 to 305 this exception messes with the normal behavior that would result from passing a boolean array to an empty numpy array.

Are you sure? Here is an example of passing a boolean ARRAY to an empty numpy array

In [3]: np.array([])[[False]]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-13a96ef05760> in <module>()
----> 1 np.array([])[[False]]

IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 1
pindakaas42 commented 4 years ago

This is not really my idea as much it is just how numpy handles this. It works if you remove one pair of brackets np.array([])[False] will not throw an error. Although the resulting behavior by numpy is also a bit unintuitive adding a dimension to the array with length 0 every time. So maybe it's not that well defined and consistent.

hainm commented 4 years ago

It works if you remove one pair of brackets np.array([])[False] will not throw an error.

Yes, I am aware of that from your original post. That's why I asked what your use case was if you want to use a single boolean value to index (given the fact that you suggested to support boolean index arraylike).

pindakaas42 commented 4 years ago

Sorry I did not express myself well then. We are talking now about the exception is raised, when passing empty arrays as an index to the Trajectory class, and my issue with that exception that I mentioned. The topic was opened about the fact that boolean index arraylike do work as indices in Trajectory but not in the TrajectoryIterator class. So two different issues basically, sorry.

hainm commented 4 years ago

ok. So I go ahead closing this case since the code is pushed. Please feel free to open another issue if you have further code example. Updating the code is easy, I just want to find out if it's worth the change.

pindakaas42 commented 4 years ago

So just that it is clear Trajectory will still take boolean arrays and TrajectoryIterator will not. Changing TrajectoryIterator to have the same interface as numpy arrays should not be hard but it would be some work. It's up to you, I just wanted to raise the issue.

hainm commented 4 years ago

Uhm, you are confusing me more. I already said the issue was fixed in master branch. TrajectoryIterator is now accepting boolean array.

hainm commented 4 years ago

Uhm, you are confusing me more. I already said the issue is fixed in master branch. TrajectoryIterator is now accepting boolean array.

It's here: https://github.com/Amber-MD/pytraj/issues/1512#issuecomment-698407587

pindakaas42 commented 4 years ago

Sorry for causing more confusion, thanks for taking my feedback into account.