Subaru-PFS / spt_target_uploader

A web application to validate and upload target lists with pointing simulations for PFS observation
https://pfs-etc.naoj.hawaii.edu/uploader/
MIT License
1 stars 0 forks source link

Memory efficient clustering algorithm #256

Closed monodera closed 2 months ago

monodera commented 3 months ago

Is your feature request related to a problem? Please describe. DBSCAN uses a lot of ram for a dense distribution. If there is a memory-efficient alternative, I'd like to switch to it. For example, ~45k objects in the COSMOS field uses all memory in the server including swaps, so it crashes with memory error.

Describe the solution you'd like Maybe HDBSCAN? I'm not sure.

Describe alternatives you've considered Ask to set more memory on the virtual machine (partly done).

monodera commented 3 months ago

@wanqqq31 Have you ever tested alternative clustering algorithms instead of DBSCAN? It seems to generate a distance matrix and the memory requirement would be up to O(n^2). If there is one providing similar-enough results with less memory usage like O(n) or O(nlogn), I'd like to consider it. I'll also google the possibilities.

monodera commented 3 months ago

I'm working on the issue/256-memory-efficient-clustering-algorithm branch to use HDBSCAN.

wanqqq31 commented 3 months ago

I see, I have not noted this problem before. I will also work on it to find a better algorithm.

monodera commented 3 months ago

I think the current version at the branch above works fine. I've deployed it to the dev URL.