The current "Advanced usage" section in the README makes it sound like there's almost no use case for passing in a similarity function. Only when the matrix doesn't fit on the hard drive?-- But people have many-terabyte drives now.
I would favor using a similarity function over a cvs file for a large dataset. It would be much faster (and we could be talking about weeks or months of compute time) because:
similarity from function uses ray, which parallelizes the job of calculating all N x N entries (a huge win if there's 10 cores)
using a .csv file requires both more I/O time (disk being much slower than RAM) and conversion from string format
The current "Advanced usage" section in the README makes it sound like there's almost no use case for passing in a similarity function. Only when the matrix doesn't fit on the hard drive?-- But people have many-terabyte drives now.
I would favor using a similarity function over a cvs file for a large dataset. It would be much faster (and we could be talking about weeks or months of compute time) because: