Closed Jake-Moss closed 5 months ago
@Jake-Moss - would it not be possible to just use scipy.sparse
format directly and avoid defining our own format/implementation?
@Jake-Moss - would it not be possible to just use
scipy.sparse
format directly and avoid defining our own format/implementation?
The scipy sparse module is really just a small wrapper around 3 numpy arrays, the classes and such just provide convenient access to these formats. I don't plan on using our own format but rather implementing a similar wrapper that we can access without the gil, while encapsulating all the logic for disk reading/writing/indexing, and allowing us to later extend the AequilibraE matrix class with sparse support. I want do it in a way that is completely compatible with the scipy matrices as well, they even provided methods to construct a sparse object given the 3 arrays.
All the formats I mentioned above are implemented by scipy as well
Whoops I didn't think renaming a branch would close the PR. We can live with the typo for now
Something that didn't occur to be original is that we require thread-safe construction. I think this can be resolved by simply treating the matrix as sparse rows, potentially empty, during construction, assuming we parallelise over origins. I don't believe this to be a big issue but just makes it harder with how much I can develop without seeing concrete usage.
Superseded by https://github.com/AequilibraE/aequilibrae/pull/515. Work was copied over
I've started looking into a the sparse matrix options and thought it best to begin grouping some notes and work related to it.
Format wise we have a couple options
CSR, compressed sparse row Stores 3 vectors, one of values, col index, and row index. Fast row access, not col access. Fast matrix multiplication. Not great for incremental construction. Block encoding is possible
CSC, compressed sparse column Same as CSR but for fast column access
COO, list of coordinates Solid for incremental construction, single element per entry. Random access is fastest when sorted by row then col index. Horrible for just about anything else
LIL, list of lists Good for incremental construction, builds whole rows at a time. Couple of feasible implementations, could be:
DOK, dictionary of keys Good for incremental construction, builds single elements at a time Typical implementation would be with hash maps of (row, col) -> value, great for very sparse matrices, or matrices with little groupings of element
All these formats are implementable ourselves, I don't believe we need to pull in anything special to construct or manipulate these formats. Provided our formatting is standard and correct, conversion to a scipy.sparse matrix is trivial. Storage using omx should also be fine.
Our choice (or rather choices) then really comes to the manner in wise we wish to construct the select link results. @pedrocamargo do you think the existing select link results would be similar to the ones we'll be producing with the planned system? I want to take a look at generally the order in which we construct the matrix, and what the end result looks like.