elixir-nx / scholar

Traditional machine learning on top of Nx
Apache License 2.0
418 stars 43 forks source link

Batch k-NN #253

Closed krstopro closed 5 months ago

krstopro commented 6 months ago

I am honestly not sure where should this be implemented. Right now I added linear_search (a better name might be needed, e.g. brute_force_search) inside Scholar.Neighbors.Utils. Like this it can be used inside other modules such as Scholar.Neighbors.KNearestNeighbors or dimensionality reduction algorithms (t-SNE, Trimap, PacMAP).

The function itself should be documented and tested. More distances are needed. I am just submitting a draft to get some feedback.

Closes #239.

josevalim commented 6 months ago

It looks good to me. For now it is a private module, so we don't have to sweat too much about the name. Does this function have any relationship with find_neighbour? If so, maybe we call one linear_search and the other batch_linear_search?

krstopro commented 6 months ago

I would rename find_neighbors to linear_search_with_candidates as that is what it really is: it performs linear search, but only on indices specified with candidates tensor. https://github.com/elixir-nx/scholar/blob/0d1bcc106b8a1486d08eb67faba2f4326c3d5a7e/lib/scholar/neighbors/utils.ex#L13

msluszniak commented 6 months ago

There is a function for handling multiple distances that are in t-SNE, NNDescent, and Trimap modules. I guess it might be worth to move it to Scholar.Shared module.

josevalim commented 5 months ago

linear_search and linear_candidate_search (or linear_neighbour_search) are fine to me! As I said, it is your call, the whole API is private, so focus on what makes the code cleaner :)

krstopro commented 5 months ago

Closing this in favor of #257.