Persistent Homology Algorithm Toolkit (PHAT)

shnizzedy commented 6 years ago

The boundary_matrix class takes a "Representation" class as a parameter. This representation defines how columns of the matrix are represented and how low-level operations (e.g., column additions) are performed. The right choice of the representation class can be as important for the performance of the program as choosing the algorithm. We provide the following choices of representation classes:

vector_vector: Each column is represented as a sorted std::vector of integers, containing the indices of the non-zero entries of the column. The matrix itself is a std::vector of such columns.

vector_heap: Each column is represented as a heapified std::vector of integers, containing the indices of the non-zero entries of the column. The matrix itself is a std::vector of such columns.

vector_set: Each column is a std::set of integers, with the same meaning as above. The matrix is stored as a std::vector of such columns.

vector_list: Each column is a sorted std::list of integers, with the same meaning as above. The matrix is stored as a std::vector of such columns.

sparse_pivot_column: The matrix is stored as in the vector_vector representation. However, when a column is manipulated, it is first converted into a std::set, using an extra data field called the "pivot column". When another column is manipulated later, the pivot column is converted back to the std::vector representation. This can lead to significant speed improvements when many columns are added to a given pivot column consecutively. In a multicore setup, there is one pivot column per thread.

heap_pivot_column: The same idea as in the sparse version. Instead of a std::set, the pivot column is represented by a std::priority_queue.

full_pivot_column: The same idea as in the sparse version. However, instead of a std::set, the pivot column is expanded into a bit vector of size n (the dimension of the matrix). To avoid costly initializations, the class remembers which entries have been manipulated for a pivot column and updates only those entries when another column becomes the pivot.

bit_tree_pivot_column (default representation): Similar to the full_pivot_column but the implementation is more efficient. Internally it is a bit-set with fast iteration over nonzero elements, and fast access to the maximal element.

— Documentation on PyPI

There is also a Python API reference which makes the Python wrapper look like it includes access to all of the C++ functionality. The manipulation methods available appear to be:

compute_persistence_pairs(reduction=<reductions.twist_reduction: 1>) Computes persistence pairs (birth, death) for the given boundary matrix.

compute_persistence_pairs_dualized(reduction=<reductions.twist_reduction: 1>) Computes persistence pairs (birth, death) from the dualized form of the given boundary matrix.

convert(representation) Copy this matrix to another with a different representation

I'm unsure of either end of this strategy: what "representation" to apply and how to interpret the persistence pairs.

shnizzedy commented 6 years ago

on GitHub

binarybottle commented 6 years ago

Great! I just read the pypy and api pages and this looks very promising. I believe we are fine to use the default representation, and the [birth, death] pairs for each row is all we need to conduct further analysis ourselves or with Steve Ellis's help.

ChildMindInstitute / questionnaire-diagnosis

Persistent Homology Algorithm Toolkit (PHAT) #5