WireCell / wire-cell-toolkit

Toolkit for Liquid Argon TPC Simulation and Reconstruction
https://wirecell.github.io/
Other
7 stars 21 forks source link

Multidimensional array storage order #80

Open HaiwangYu opened 4 years ago

HaiwangYu commented 4 years ago

First, different storage order means different offset calculation mapping multidimensional array to 1D memory structure. Mix using of different storage order (with implicit offset calculation) is dangerous. https://eli.thegreenplace.net/2015/memory-layout-of-multi-dimensional-arrays

We currently have three multidimensional array realizations: ITensor (SimpleTensor), Eigen::Array and torch::tensor.

Eigen uses column-major storage order by default. Row-major is possible but seems Eigen prefers the other one. Plus, depending on the backend choice, different orders may have different performance. This needs tests to confirm for sure. https://eigen.tuxfamily.org/dox/group__TopicStorageOrders.html

torch::tensor is row-major: https://discuss.pytorch.org/t/are-tensors-column-major-or-row-major/832/4

ITensor uses basically pointer to a memory block. In principle it does not specific storage order. But it is fair to intuitively treat these pointers as C array and assuming it is using row-major. This may result unexpected memory layout when mapping Eigen to ITensor: page 9 of this talk

There are two ways of using ITensor:

  1. element-wise: needs to specify offset calculation
  2. map to another ND-array realization and use APIs from that realization

Currently option 1 is only used in tests (as far as I remember).

Examples of option 2:

In the future, we may think about unifying to one storage order. For now, I think as long as we are careful mapping between these two orders and use consistent high level APIs, we should be OK. Making this issue to track progress on this topic.

An example mapping between these two orders, which confused me quite a bit. Now I finally understand... https://github.com/WireCell/wire-cell-toolkit/blob/master/pytorch/src/DNNROIFinding.cxx#L229

brettviren commented 4 years ago

Thanks for making this.

One small correction, ITensor carries a marker of the storage order used in its data array:

https://github.com/WireCell/wire-cell-toolkit/blob/master/iface/inc/WireCellIface/ITensor.h#L35

The order is actually a std::vector holding dimension indices so could be something weird like 1,2,0. Boost mulitarray has support for "weird" orders in addition to C and FORTRAN conventions.

In principle, then, anything building or using an ITensor must take care to check this "order" vector and do the right thing when they interpret the data array.

The code in WireCellAux/Util.h looks ripe for a little updating to check for order and do the right thing.

For now, I think it's fair to throw an exception if any "weird" order is found beyond C or FORTRAN conventions.

HaiwangYu commented 4 years ago

Thanks for the correction!

Yes, I think we need to make use of this order marker if we want to build some native ITensor operations.

brettviren commented 3 years ago

This is just informational.

For a long time, I have been battling data corruption related to use of Numpy files. I have made NumpyHelper.h in util to try to (I thought) fix the problems related to Eigen vs Numpy memory ordering.

Well, I just discovered that Numpy on the Python side has a "feature" that contributes greatly to my insanity.

Apparently the memory ordering will be picked differently for different shaped arrays and will flip if a .T transpose is taken. In particular, it meant that depo "data" arrays and depo "info" arrays were being saved with different orders.

Here shows some Python-side flips:

In [6]: di.T.flags
Out[6]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [7]: di.shape
Out[7]: (6907, 4)

In [8]: di.flags
Out[8]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [9]: di.T.shape
Out[9]: (4, 6907)

In [10]: di.T.flags
Out[10]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [26]: numpy.array(di.T, order='C').flags
Out[26]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False