Sparse matrices - Githubissues

Jake-Moss commented 7 months ago

I've started looking into a the sparse matrix options and thought it best to begin grouping some notes and work related to it.

Format wise we have a couple options

CSR, compressed sparse row Stores 3 vectors, one of values, col index, and row index. Fast row access, not col access. Fast matrix multiplication. Not great for incremental construction. Block encoding is possible
CSC, compressed sparse column Same as CSR but for fast column access
COO, list of coordinates Solid for incremental construction, single element per entry. Random access is fastest when sorted by row then col index. Horrible for just about anything else
LIL, list of lists Good for incremental construction, builds whole rows at a time. Couple of feasible implementations, could be:
- vector of pointers to vectors Gives expandable and variably sized rows, rows are nullable but slicing column (or row) wise is horrible for spacial locality. Row (or col) slices though are very good. Requires that all rows (or col) are allocated, not great for huge matrices
- offset encoded Instead of storing multiple all rows, we can store and array of offsets as to where the rows pointer is. Much sparser with fast row slicing, but lookups for rows would require bisection
DOK, dictionary of keys Good for incremental construction, builds single elements at a time Typical implementation would be with hash maps of (row, col) -> value, great for very sparse matrices, or matrices with little groupings of element
- Block encoding It's possible to encode small, dense blocks of the matrix. Say there's a 1k * 1k sparse float32 matrix, so that's 4 bytes per entry, which a cacheline size of 64 bytes, you could store a 4x4 dense matrix blocks in a single cache line. This is really useful for storing small dense clusters of values.

All these formats are implementable ourselves, I don't believe we need to pull in anything special to construct or manipulate these formats. Provided our formatting is standard and correct, conversion to a scipy.sparse matrix is trivial. Storage using omx should also be fine.

Our choice (or rather choices) then really comes to the manner in wise we wish to construct the select link results. @pedrocamargo do you think the existing select link results would be similar to the ones we'll be producing with the planned system? I want to take a look at generally the order in which we construct the matrix, and what the end result looks like.

jamiecook commented 7 months ago

@Jake-Moss - would it not be possible to just use scipy.sparse format directly and avoid defining our own format/implementation?

Jake-Moss commented 7 months ago

@Jake-Moss - would it not be possible to just use scipy.sparse format directly and avoid defining our own format/implementation?

The scipy sparse module is really just a small wrapper around 3 numpy arrays, the classes and such just provide convenient access to these formats. I don't plan on using our own format but rather implementing a similar wrapper that we can access without the gil, while encapsulating all the logic for disk reading/writing/indexing, and allowing us to later extend the AequilibraE matrix class with sparse support. I want do it in a way that is completely compatible with the scipy matrices as well, they even provided methods to construct a sparse object given the 3 arrays.

All the formats I mentioned above are implemented by scipy as well

Jake-Moss commented 7 months ago

Whoops I didn't think renaming a branch would close the PR. We can live with the typo for now

Jake-Moss commented 7 months ago

Something that didn't occur to be original is that we require thread-safe construction. I think this can be resolved by simply treating the matrix as sparse rows, potentially empty, during construction, assuming we parallelise over origins. I don't believe this to be a big issue but just makes it harder with how much I can develop without seeing concrete usage.

Jake-Moss commented 5 months ago

Superseded by https://github.com/AequilibraE/aequilibrae/pull/515. Work was copied over

AequilibraE / aequilibrae

Sparse matrices #511