MDSplus / mdsplus

The MDSplus data management system
https://mdsplus.org/
Other
71 stars 44 forks source link

Question regarding GetSegment.fun and evaluation of segmented nodes #2426

Open merlea opened 2 years ago

merlea commented 2 years ago

When using segments to store data in a node, we have found some weird behaviour when either reading the whole node or extracting a single segment using GetSegment.fun.

The documentation of segmented data makes it clear that the time dimension should be the LAST dimension, we however get an expression equivalent to build_signal(_data,*,_time) as can be seen in the latest version of GetSegment.fun at the end of this message.

This causes inconsistencies between the shape of the data and the shape of the dimensions, for example with _a=GetSegment(_nid,_idx); if shape(_a) is [10,20,30] we will get size(dim_of(_a,0))=30 (here 10 would have been expected) size(dim_of(_a,1))=20 and size(dim_of(_a,2))=30.

I wonder whether this is truly the intended behaviour as it wouldn't be such a big effort to form an expression with the time dimension in the correct place.

Finally the same problem appears when simply evaluating the node, which I guess calls a similar method to the one contained in GetSegment.fun.

public fun GetSegment(as_is _node, in _idx) {
  _nid=getnci(_node,"NID_NUMBER");
  _data=*;
  _dim=*;
  _status=TreeShr->TreeGetSegment(val(_nid),val(_idx),xd(_data),xd(_dim));
  if (_status & 1) {
    _scl=*;
    if(TreeShr->TreeGetSegmentScale(val(_nid),xd(_scl))&1 && KIND(_scl)!=0) {
      return(make_signal(_scl,_data,_dim));
    } else {
      return(make_signal(_data,*,_dim));
    }
  } else {
    return(*);
  }
}
merlea commented 2 years ago

Can any MDSplus developer have a look at this please? I want to make sure I haven't misunderstood anything about segments...

joshStillerman commented 2 years ago

merlea - Sorry for the delay, this slipped through the cracks. Looking at it now. -Josh

joshStillerman commented 2 years ago

I think that the concept of Signals (in its current implementation) breaks down for multi dimensional arrays. I have run into this myself with data from some control system experiments I am doing. If you think about a video, stored in a node the contents of a frame are all adjacent in memory, hence time is the last (slowest) dimension. If you stored the video in a non segmented node, you could in fact have:

build_signal(video, *, row-dim, col-dim, time)

This would be 'most correct', however in practice people usually want the dim_of() the video to be the times of the frames. We can look into preserving (creating ?) the multi-dimensional signal-ness of segmented records. Is this an important use case for you?

I have verified that you can construct a signal as you describe:

$ python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
from MDS>>> from MDSplus import Tree
>>> from MDSplus import Tree,Signal,Range
>>> t = Tree('main', 1)
>>> n = t.MEMBER
>>> s1 = Signal(Range(0, 9, 1), None, Range(0,9,1))
>>> s2 = Signal(Range(10, 19, 1), None, Range(10,19,1))
>>> s4 = Signal([s1.data(), s2.data()], None, s1.dim_of(), s2.dim_of(), (1., 2.))
>>> s4
Build_Signal(List(,[0,1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18,19]), *, 0 : 9 : 1, 10 : 19 : 1, List(,1D0,2D0))
>>> s4.data()
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]], dtype=object)
>>> s4.dim_of().data()
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
>>> s4.dim_of(1).data()
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19], dtype=int32)
>>> s4.dim_of(2).data()
array([1D0, 2D0], dtype=object)
>>> 

Since makeSegment and its equivalents take an array and the description of the time dimension, not a signal and a description of its time dimension, the segments only support the retrieval of 1 dim.

Help on method makeSegment in module MDSplus.tree:

makeSegment(start, end, dim, array, idx=-1, rows_filled=-1) method of MDSplus.tree.TreeNode instance
    Make a record segment
    @param start: Index of first row of data
    @type start: Data
    @param end: Index of last row of data
    @type end: Data
    @param dim: Dimension information of segment
    @type dim: Dimension
    @param array: Contents of segment
    @type array: Array
    @param idx: Target segment index, defaults to -1, i.e. append
    @type idx: int
    @param rows_filled: Rows filled, defaults to array.shape[0], i.e. full
    @type rows_filled: int
    @rtype: None

The case I have that deals with this awkwardness is the following:

>>> t = Tree('sparc', 0)
>>> n = t.BAGEL.SIGNALS.POS_1
>>> n.record
Build_Signal(DATA(.BAGEL.HARDWARE:INPUT.OUTPUTS.OUT01:VALUE)[0], *, DIM_OF(.BAGEL.HARDWARE:INPUT.OUTPUTS.OUT01:VALUE))
>>> 

Here I get the data of the segmented record, subscript it by index, and make a signal by attaching its dim. I guess you could potentially build expressions or expression nodes, that return a signal with the dims that you want given the constituent parts.

When this code was first developed, we started with a general collection of slices. We quickly got confused, and realized that the case we needed for our applications restricted slicing to the fastest dim.

Let us know (on discord - https://discord.gg/4gt6RbhHDm ) if your more complicated case is critical, and we can discuss solutions.

-Josh

merlea commented 2 years ago

HI Josh, thanks for your answer.

But I am actually a bif confused by some of it.

First of all I don't quite get why you say that signals break down with multi-dimensional arrays. For me the most basic use of signals is to be able to subscript it using its dimensions (e.g. if _a is a 1D signal then _a[1.3] will return some interpolation of dim_of(_a)->data(_a) at the value 1.3). And this works very well even with multiple dimensions, provided that the dimensions provided in the BUILD_SIGNAL descriptor are vectors and have the correct number of elements. Which is often not the case with segmented data due to time being put as the first dimension when it should be last.

One example use case is for the digital control system at TCV, it stores its output as segmented data and some of it (like the flux map or plasma profiles from the equilibrium reconstruction) have multiple dimensions. When we tried to visualize them using jScope, we had to write custom BUILD_SIGNAL expressions as the scopes were otherwise confused by the mismatch between the shape of the data and the size of each dimensions.

I understand that it is not possible to provide information about the dimensions other than time in a segmented node, but this mismatch between the size(data(NODE),0) and size(dim_of(NODE,0)) I find really puzzling and seems to break every assumption that has been made about signals in other parts of the library.

Antoine.

joshStillerman commented 2 years ago

Antoine -

  I would be happy to discuss this with you on zoom or on discord ( https://discord.gg/4gt6RbhHDm ).  I think the clearest example of what is going on is that of frames of video where the 1st dimension is the rows of pixels, the 2nd dimension is the columns of pixels and the third dimension is the times of the frames. That is if we were consistent.  However people think of the video sub-scripted by the times of the frames.  The current behavior is fairly baked in to the code, so the question(s) we should discuss are:

On 3/11/22 10:24 AM, Antoine Merle wrote:

HI Josh, thanks for your answer.

But I am actually a bif confused by some of it.

First of all I don't quite get why you say that signals break down with multi-dimensional arrays. For me the most basic use of signals is to be able to subscript it using its dimensions (e.g. if |_a| is a 1D signal then |_a[1.3]| will return some interpolation of |dim_of(_a)->data(_a)| at the value |1.3|). And this works very well even with multiple dimensions, provided that the dimensions provided in the |BUILD_SIGNAL| descriptor are vectors and have the correct number of elements. Which is often not the case with segmented data due to time being put as the first dimension when it should be last.

One example use case is for the digital control system at TCV, it stores its output as segmented data and some of it (like the flux map or plasma profiles from the equilibrium reconstruction) have multiple dimensions. When we tried to visualize them using jScope, we had to write custom |BUILD_SIGNAL| expressions as the scopes were otherwise confused by the mismatch between the shape of the data and the size of each dimensions.

I understand that it is not possible to provide information about the dimensions other than time in a segmented node, but this mismatch between the |size(data(NODE),0)| and |size(dim_of(NODE,0))| I find really puzzling and seems to break every assumption that has been made about signals in other parts of the library.

Antoine.

— Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2426#issuecomment-1065214929, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZPQISLHQAZNZBCZWFDU7NQTTANCNFSM5JHJBNWQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 @.***

merlea commented 2 years ago

Hi Josh, Yes I think it's a good idea to discuss this. I guess a good time slot is 2pm-5pm in Lausanne, which should be 9am-12pm in Boston these days? I am usually connected on Discord so you can send me a message whenever you are available. Antoine.

WhoBrokeTheBuild commented 1 year ago

We are happy to discuss this with you, and Josh will reach out to schedule a time.