OpenDrift / trajan

Trajectory analysis package for simulated and observed trajectories
https://opendrift.github.io/trajan/
GNU General Public License v2.0
11 stars 5 forks source link

Technical question: what in the code allows for making this an extension of xarray? #64

Closed jerabaul29 closed 1 year ago

jerabaul29 commented 1 year ago

Following the documentation:

_TrajAn_ is an [Xarry extension](https://docs.xarray.dev/en/stable/). On drifter (or trajectory)
datasets you can use the .traj accessor
on [xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) s.

A technical / n00b question: what part of the code / where is the code machinery that makes this possible?

gauteh commented 1 year ago

https://github.com/OpenDrift/trajan/blob/main/trajan/trajectory_accessor.py#L20

https://docs.xarray.dev/en/stable/internals/extending-xarray.html

gauteh commented 1 year ago

We use getattr to dereference to inner instances. I'm not totally convinced this is optimal, since it is a bit hacky in python, but so far the best way I have found to keep things separate and get as much help as possible from the python type system.

jerabaul29 commented 1 year ago

Perfect, thanks very much for pointing to this. A small question: for people like me who are not very familiar with xarray extensions, could it be useful to document this a bit / add these couple of lines and explanations somewhere in the doc, as this is the main mechanism trajan is built around? :) May make onboarding easier for new people like me :) .

So you actively choose to use getattr rather than (if I understand well the xarray doc page above) "just" putting the xarray object into a self._obj that gets automatically grabbed by xarray when doing xarray like function calls, is this correct? Is this what gives you an advantage in terms of typing / separation that you mention? (just trying to wrap my head :) ).

gauteh commented 1 year ago

Perfect, thanks very much for pointing to this. A small question: for people like me who are not very familiar with xarray extensions, could it be useful to document this a bit / add these couple of lines and explanations somewhere in the doc, as this is the main mechanism trajan is built around? :) May make onboarding easier for new people like me :) .

Yes, I think so too. I made a very brief description of where a user can expect to find methods documented at the end of this section: https://opendrift.github.io/trajan/index.html#usage.

So you actively choose to use getattr rather than (if I understand well the xarray doc page above) "just" putting the xarray object into a self._obj that gets automatically grabbed by xarray when doing xarray like function calls, is this correct? Is this what gives you an advantage in terms of typing / separation that you mention? (just trying to wrap my head :) ).

The getattr comment was not meant to refer to the xarray-extension: it is how the rest of the code is accessed from the accessor. The TrajAccessor class that is accessible through dataset.traj has an inner instance of a class that implements the traj.Traj class. The inner instance is either a Traj1d or Traj2d instance (both subclass traj.Traj). TrajAccessor instantiates the correct one depending on whether the dataset has a 1d or 2d time coordinate. This way we can put code that works regardless of the format in traj.Traj, and code that needs to be adapted to either one in the correct one without a conditional in the code -- and any code in Traj1d and Traj2d knows for sure what format the data is in already.

The abstract methods in traj.Traj, e.g. gridtime or timestep can present a unified interface to the user, but they are both implemented differently in Traj1d and Traj2d since they have a different meaning in implementation, but a conceptually or physical very similar meaning. The documentation is also forwarded from traj.Traj so that it does not need to be maintained exactly the same in both sub-implementations.

In order to access the methods on the inner instance of this class, getattr is used to forward the call: otherwise you would always have to do: dataset.traj.inner.gridtime(...), but now this is transparent to the user, and we still keep the benefit of the logical separation of 1d and 2d and do not risk confusing the two in the code.