Closed lykos98 closed 3 weeks ago
Hi @lykos98. In the paper, the application (HACC) was responsible for the distributed memory. The way it is organized is that it construct local domains with some halos that guarantees that the clusters will be fully contained in the local data. That is what ArborX' algorithms were run on. We do not have a distributed DBSCAN implementation at the moment.
Hi thank you for the quick answer! I understand implementing it in a true distributed fashion is quite difficult, nevertheless the performance reported in the paper is impressive. Congratulations!
@lykos98 I'm not sure it's that difficult. In the back of my mind, I always thought that if we needed to implement it, we could combine our local algorithm with a distributed part from Patwary et al paper.
I'm sorry for not getting back to you sooner. I missed the notification. Thank you for the reference! I had already encountered that paper a while ago, I did not mean actually in the previous comment "difficult" in the algorithmic sense but in the implementation side which needs a non-negligible amount of effort to make it work. I'll keep an eye on this repository, thank you again!
Hi! I found this repository by reading the newly published paper on the advancements of the library. The paper mentions that you were able to run dbscan on 2*10^12 points, but in the documentation it is not clear how and if the actual implementation of dbscan works in distributed memory.
Can you provide an example of the algorithm applied also using mpi when the data is scattered between different processes? Thank you! F