cositools / cosipy

The COSI high-level data analysis tools
Apache License 2.0
3 stars 16 forks source link

histpy "sparse" and "dense" terminology #183

Closed avalluvan closed 1 month ago

avalluvan commented 1 month ago

This is a question on terminology and method nomenclature pertaining to histpy. Currently, histpy.to_sparse() returns a "dense" matrix, condensing the shape of the matrix into an array with non-zero elements while histpy.to_dense() returns a "sparse" matrix with the original array shape including a substantial number of zero elements. Is this the intended functionality or have the terminologies been swapped? I couldn't find a github page for histpy, hence posting it here.

israelmcmc commented 1 month ago

@avalluvan I get your point, but it refers to something different. to_sparse means to set the internal storage format to something appropriate for a sparse matrix, using the sparse library. to_dense returns a format appropriate for dense matrices, using numpy array, that although it looks sparse, for computational purposes it is treated really as a dense matrix --i.e. the algorithm doesn't know of all the 0's.

For reference, these are the histpy docs and repo : https://histpy.readthedocs.io/en/latest/ https://gitlab.com/burstcube/histpy

I'm closing this issue since this is indeed the intended functionality. Feel free to keep commenting on it if you have more question, and reopen it if you think I missed something.

avalluvan commented 1 month ago

Perfect! That clears it up. The documentation also explains the various "compression" and "decompression" steps when converting data structures and that arithmetic, projection, and/or slicing operations must be performed with dense data structures for optimal performance.

Nevertheless, it will be helpful if the user-facing terminology can be updated, perhaps through some wrapper functions, to explicitly state that it is the data structure that is being changed rather than the underlying vector representation.