with > 2K samples, the html output becomes nearly unusable. but in large cohorts, nearly all samples will be unrelated. we can sub-sample pairs that are expected to be unrelated and appear unrelated by phenotype.
this will reduce the memory usage and make somalier html output useful for huge cohorts.
it will require a substantial change in the html as that's expecting the full matrix of all vs all. will need a sparse representation instead.
from #31
with > 2K samples, the html output becomes nearly unusable. but in large cohorts, nearly all samples will be unrelated. we can sub-sample pairs that are expected to be unrelated and appear unrelated by phenotype.
this will reduce the memory usage and make somalier html output useful for huge cohorts. it will require a substantial change in the html as that's expecting the full matrix of all vs all. will need a sparse representation instead.