DeepLearnPhysics / larcv2

MIT License
13 stars 16 forks source link

element indexing for batchfillerimage2d does not follow C-memory convention #29

Open twongjirad opened 6 years ago

twongjirad commented 6 years ago

In batchfillerimage2d,

the array is defined with elements (caffe mode):

(batchsize,channel,nrows,ncols)

this implies that ncols is the "contiguous" dimension. For example, if I defined a (4,3) 2d array in c++:

#include <iostream>

int main(int nargs, char** argv ) {

  float foo[4][3] = { {0,1,2}, {3,4,5}, {6,7,8},{9,10,11} };
  int nrows = 4;
  int ncols  = 3;

  std::cout << "foo[2][1]=" << foo[2][1] << std::endl;
  std::cout << "foo[ncols*2 + 1]=" << *(&foo[0][0] + ncols*2+1) << std::endl;
  std::cout << "foo[nrows*1 + 2]=" << *(&foo[0][0] + nrows*1+2) << std::endl;

};

The outcome of the above is:

twongjirad@blade:~/working/nutufts/larflow/ana$ g++ -o foo foo.cxx
twongjirad@blade:~/working/nutufts/larflow/ana$ ./foo 
foo[2][1]=7
foo[ncols*2 + 1]=7
foo[nrows*1 + 2]=6

However, in batchfillerimage2d, the tensor is defined with following dims,

      if (_caffe_mode) {
        dim[0] = batch_size();
        dim[1] = _num_channels;
        dim[2] = _rows;
        dim[3] = _cols;
     }

but the way the image is indexed is

size_t caffe_idx = 0;
    for (size_t row = 0; row < _rows; ++row) {
      for (size_t col = 0; col < _cols; ++col) {
        _caffe_idx_to_img_idx.at(caffe_idx) = col * _rows + row;
        ++caffe_idx;
      }
    }

Note that in "C" memory layout, the expectation would be

row*_ncols + col 

as the right-most dimension should be the most "rapidly" changing dimension.

This might be for historical reasons for caffe, but this could cause (is causing) confusion about laying out tensors properly in training and analysis routines for pytorch and tensorflow where we are routinely reshaping/manipulating the tensor.

What to do? I would like to change this, but changing this is a BREAKING for everyone. Are we stuck with this?

drinkingkazu commented 6 years ago

My question: I'm on vacation today/tomorrow so my response is slow :( I have 1st question though: do you have to use caffe mode (BCHW)?

My opinion: Now assuming the answer is yes... I don't think this mode is used so I would be 100% pro to changing it, but it should support (BCHW) mode obviously. I have to look closely to make a useful input... so that has to be a bit later... I am not sure if I understand the "C" memory layout argument: this is 1D array filling and how the ordering should be is determined solely by the comparison of how numpy/caffe/whatever and image2d interpret 1D array as 2D array. Or I am missing something clearly.

Not useful comments: The code is migrated from larcv to larcv2 and I did test right off the migration, but since then it's not tested. We may have changed row vs. col definition since then (i.e. I remind this convention is never unique between numpy vs. opencv + standards by other 2D visualization libraries I know), and maybe the change was not propagated to this code...