DeepLearnPhysics / larcv2

MIT License
13 stars 16 forks source link

Unifying 2D and 3D data types for SparseTensor3D and ClusterVoxel3D #26

Open drinkingkazu opened 6 years ago

drinkingkazu commented 6 years ago

In particular would love to hear from @twongjirad @coreyjadams ...but anyone's input is welcome.

I would like to suggest somewhat a drastic change, and that is to remove a distinction of 2D and 3D data types for tensors and clusters.

Namely they are all either VoxelSet or VoxelSetArray with ImageMeta or Voxel3DMeta. But because they are different types, we have to write modules/algorithms for each type despite the fact that operation is typically on the basic common types. Unifying these types will allow writing generic algorithms much easier where we don't care about the dimensions. Why we have 2D vs. 3D to begin with? This is somewhat historical that we used to have 2D clusters in an old larcv. In principle there's nothing needed to keep 2D and 3D, and I think it only gave us headache so far (sorry)...

So I would like to propose making this change including modification of corresponding codes under larcv/app directory, and I am happy to volunteer. I think it's 1~2 day job for me, and I might do it during MicroBooNE analysis retreat next week Monday/Tuesday (June 23rd/24th). Let me know what you think.

twongjirad commented 6 years ago

I am actually all for this!!!!! For images, it would be nice to have a channel axis.

Of course, I didn't see this until now, so i am passed the response deadline.

twongjirad commented 6 years ago

actually, basically, i wish I we could inherit a c++ version of numpy arrays that also inherited an imagemeta object. Maybe eigen? Can we serialize that into a ROOT tree?

larlight commented 6 years ago

OK. That would be a bit different than what I implemented (and haven't pushed since not yet made compilable to the rest of larcv), but can be re-done. VoxelSet is for sparse data, we need to store 2 values per voxel in the storage: value and location in original matrix. Currently VoxelSet is std::vector of Voxel where Voxel stores those 2 information. To make it more "numpy array" like, VoxelSet should store std::array of float and std::array of uint32 (i.e. value and location separately) and the container class to ensure the integrity of two arrays. Do I understand your request ?

twongjirad commented 6 years ago

I think having both a sparse and dense object is fine. let the user pick which is appropriate. we can convert from one to the other as well (though maybe not efficiently).

if for some reason the use-case involves info that is dense, use of sparse matrix becomes an issue right?

I guess I initially read the comment of why do we have a 2d tensor object, when it can be subsumed by a 3d tensor object, not why do we have a dense and sparse object.