appliedtopology / javaplex

Persistent Homology and Topological Data Analysis Library
BSD 3-Clause "New" or "Revised" License
190 stars 57 forks source link

How to restrict the total number of simplices from a distance matrix data? #24

Closed peter308 closed 5 years ago

peter308 commented 5 years ago

Dear Admin I have a bunch of distance matrix data, and I am suffering from one problem these days. The distance matrix of my system is quite large, the size is 550x550, and I usually got larger than 1x10^8 simplices, this results in the java heap space or GC overhead limit exceed issue. I already ran the job on a hpc with ram size as big as 128GB. I also tried witness stream and landmark selector, but the number of simplices are still quite large and leading to GC overhead limit errors. Is there an upper limit for the number of simplices that Javaplex can handle? Or can you give me some advice e.g. how to change the options in the script file, so that I can get the barcode and Betti numbers for my case, even just rough results are good enough. Sincerely appreciated.

Best Regards, Peter

henryadams commented 5 years ago

Hi Peter, I'm sure you already saw Section 7.1 in the Javaplex tutorial (https://www.math.colostate.edu/~adams/research/javaplex_tutorial.pdf) on increasing the heap size. Witness complexes are a good way to get approximate answers with less computational effort. For both witness complexes and Vietoris--Rips complexes, keeping the maximum filtration parameter small (especially at first until you see how the scale of the computation grows as the maximum filtration parameter increases) is necessary in order for computations to finish on large datasets.

Beyond that, my main advice is to try more modern software, such as Ripser or GUDHI etc, for computing persistent homology. More recent software is much faster than Javaplex. A list of some such software packages is available half-way down on my following webpage: https://www.math.colostate.edu/~adams/advising/ There are also some software packages there that compute approximate Vietoris-Rips barcodes via different methods (besides witness complexes).

Best, Henry