fact-project / photon_stream

Explore the novel photon stream, based on the single photon extractor
6 stars 2 forks source link

Accelerate DBSCAN #48

Open relleums opened 6 years ago

relleums commented 6 years ago

DBSCAN is currently the limiting time factor when analysing photon-stream.

I reach about 50 events/s single thread DBSCAN clustering. using the plain sklearn implementation. Reading from jsonl is about 300 events/s and reading from binary phs is about 15k events/s with the current python reader.

DBSCAN creates an octree structure every time it is called to reduce the computational expense of calculating all distances between all points (photons in the stream). DBSCAN tries to populate a matrix of distances between points. Points which are obviously to far apart are not considered. Based on this matrix, the clustering is done. However, in FACT we know the spatial relations of the photons in advance and might be able to accelerate the creation of the distance matrix