ThreadFiller reshapes data tensor

twongjirad commented 6 years ago

If I am not mistaken, right now data returned as (N,M) where N is number of items in batch and M is unrolled vector of pixel values (for the image case). This then requires the user to reshape the data. For example, pytorch will want a 4D tensor: (N, C, H, W) where N=batches, C=channels, H=height, W=width.

This is an easy manipulation in python with numpy, but would it be worth the possible efficiency gains if we shaped it in ThreadProcessor in C++ rather than in the python environment?

twongjirad commented 6 years ago

hmm, why can't I assign this to myself and provide a label?

drinkingkazu commented 6 years ago

At the C++ level, data is stored in 1D array. It is then reshaped into (M,N) @ larcv_threadio python API. More specifically by this line.

I am pro to remove that line! But it has to happen in a coherent manner to change users' code and tutorials. More comments below.

Why that line?

I was primarily testing things with tensorflow which can handle automatic reshaping when it's copying data contents into placeholder object, so 1D array should be fine I think. However I saw that consistently giving me seg-faults. There is no problem as long as I specify the batch shape (and rest of dims can remain 1D array). So to avoid having to write a re-shape line each time in my run script, I added re-shape line here.

Why am I pro to change?

... because 1D array is probably a happy common denominator for many cases beyond tf. Then a user can reshape the returned numpy array pointer however they want. Note that, however, used numpy array container is always re-shaped into 1D array anyway each time larcv_threadio fills it (so you have to reshape each time).

Does this matter in terms of speed?

In principle, it should not. Reshaping only changes the metadata and the actual data is in 1D array of the same size I think.

Alternative solution?

https://github.com/DeepLearnPhysics/larcv2/issues/13 can be an alternative solution.

coreyjadams commented 6 years ago

I am against changing this particular line :).

(Batch, Flattened Image) vs. (Flattened batch of images)

I think the concept of image-as-1D-array is just fine, since it's quite common already and we are sticking with normal conventions. But this line isn't about 1D array or HxWxD array, it's completely flat batched array (len == NLWD) versus (N, LW*D) array. I think doing the reshape here makes a lot of sense to split off the batch size.

Assure High Performance

In principle, there is no performance issue. Reshape returns a new view of the data whenever possible, but it does not guarantee it. According to the numpy API

It is not always possible to change the shape of an array without copying the data. If you want an error to be raise if the data is copied, you should assign the new shape to the shape attribute of the array.

We could do this instead and raise an exception if it's copying data, which might be a good check to make sure we aren't. The thing to watch for is a transpose along some axis - that is causes a copy of numpy data, I believe. So, we really ought to be confident that we are delivering the data in the right convention ( (N, L, W, D) vs (N, D, W, L) vs .... ) and the flow of data would be:

Root file to c++ vector (copy)
C++ to numpy PyArray (no copy)
Reshape via numpy (no copy if done right, can raise exception), OR reshape via tensorflow (no copy? not sure)
Feed data to GPU

coreyjadams commented 6 years ago

Let me amend my earlier comment for clarity. Numpy transpose doesn't cause a copy. But it does make the data non-contiguous, and reshaping non-contiguous data does cause a copy. So my point was that since we do a transpose in PyUtils::as_array, when we do the reshape we are causing a data copy. We should remove the need for the transpose to fix this in #13 .

drinkingkazu commented 6 years ago

As a minor update, I have an important clarification. larcv_threadio is not using as_ndarray function which uses PyArray_Transpose. So there is no transpose done, and in principle we shouldn't have to worry about the issue with reshape. Note that copying of values are still done, which linearly increases as a function of an array size for Image2D.

I think we lack a solid evidence to say "reshaping is causing a delay". So adding that unit-test as myth-buster is the 1st step of resolving this issue!

I also note that using SparseTensor2D or SparseTensor3D would also help to speed up things (in fact, dramatically the real IO part), though not related to the core of this issue. There probably should be a new ticket to address how to benefit from the sparsity to speed up numpy array value copy (i.e. no need to loop over all values).

DeepLearnPhysics / larcv2