Open jurrutiag opened 2 years ago
I don't understand the purpose of a per-row radius. Do you mean per vector component? Could you give the mathematical definition of the operation?
The purpose is to have a radius per sample, i will provide a mathematical definition with an example:
In the Kraskov mutual information estimator [1] we have N samples X_i, each with d features:
Therefore we have N vectors of dimension d. We want to estimate the mutual information of these samples with respect to a vector of size N. This estimator consists of two parts, first we calculate a radius for each sample and then we calculate how many samples lie within this radius for each sample. So for the second part we define a number nx_i containing the number of samples that lie within radius r_i around sample X_i. This means that each sample has assigned its own radius which most likely will be different to other radiuses.
[1] A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,” Physical Review E, vol. 69, Jun 2004.
It caught my attention that range_search doesn't support a vector of radius values as an input, such that each value correspond to the radius to query each row. Would this be faster than query_radius from sklearn's KDTree? If not, is this the reason this is not implemented?
This could be a very good feature since it is the last thing needed for a faster (and accurate) calculation of Kraskov's mutual information.