aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
210 stars 33 forks source link

small node store #262

Closed sudiptoguha closed 3 years ago

sudiptoguha commented 3 years ago

Description of changes: saves heap by using a smaller node store when sample size, dimensions are low and float_32 is used.

sudiptoguha commented 3 years ago

Can you explain why small node store is bound to float 32 precision?

Sure. For float64 precision one would already be spending a lot of space in PointStore, cuts as float64. The savings of int -> short would have a small impact. That saving can be extracted with multiple files -- we can do that in the future if there is need. At the present moment, is there is a need for float64 x short ?

kaituo commented 3 years ago

Can you explain why small node store is bound to float 32 precision?

Sure. For float64 precision one would already be spending a lot of space in PointStore, cuts as float64. The savings of int -> short would have a small impact. That saving can be extracted with multiple files -- we can do that in the future if there is need. At the present moment, is there is a need for float64 x short ?

got it. Thanks for the explanation.