Open mshukla1 opened 7 years ago
First we need to profile the current clustering and see where the time goes.
Here is a list of issues related to this problem
dataset (genome-occurance count) sent to data api => depends on your network speed, you can be timed out (time include data transfer, computation, and transfer back)
data stored in file and run cluster problem (single cpu version) => computation complexity may vary depends on your options => parallelizing algorithm may help to reduce compute time => with multiple requests, cpu could be saturated. but recently we moved data api to chestnut, which has more CPUs.
Primary problem is that we need to send data back to server to run cluster. If we can run it on browser side (like WebAssembly, or BLAS library), that will solve problem.
If client side computation is not feasible, we can consider
...takes too long to complete and fails.
Put a limit on the input for clustering and show a clear message to the user.
See if we can deploy clustering on better machine.