[Nearest Neighbor] Code and basic test for nearest neighbor

kushalpokharel commented 11 months ago

I have added two functions: mindist and minmaxdistance, which will be helpful in pruning the tree while searching for the nearest neighbor. The code doesn't work yet. I have some doubts about the implementation:

Should we add the distance functions to the Value module as I have added or do we make it specific to module Point in n dimension?
If I make another module called Point with floatarray representing points in n dimensions. How do I make sure that the second parameter is of type Envelope?

kushalpokharel commented 11 months ago

Hi @patricoferris, please review the latest changes. I have listed my approach below which is based on the paper you provided.

Within the envelope, we have two functions mindist and minmaxdist which we need to provide to calculate the distance from a point to that envelope (bounding box).
The nearest neighbor function is available inside rtree which takes a rtree and a point (float list) and traverses each node and sees if the distance from the envelope of the node to the point is greater than the minmaxdist. If the distance is greater then it doesn't traverse, else it traverses that node.
I have added tests for two distance functions and one nearest_neighbor.

There might be some mistakes as I have only tested a basic case for nearest neighbor. Please guide me on how to create a deterministic large R tree to correctly test the nearest neighbor function.

kushalpokharel commented 11 months ago

@patricoferris, today is the last day for some contribution, if you could review this and hopefully we could resolve this, it would be an addition for the outreachy application

patricoferris commented 11 months ago

@harshey1103 please do still reference unmerged PRs in your application :))

kushalpokharel commented 11 months ago

This is a great step in the right direction @kushalpokharel. When I was thinking about the interface for this feature, I thought the distance functions would end up in the Rtree_intf.Value module? Users would then provide their own distance functions from whatever item they're storing to whatever envelope they are using? This would allow us to have a nearest neighbour function that works for all Rtree's and not just points, what do you think? Am I missing something?

Exactly my thought and my question in the first comment. I wanted to do the same. However, calculating distances from a value to an envelope was confusing to me. I think the users can give the distance function between two envelopes in the R-tree because we could turn the value into an envelope. Is this approach correct?

kushalpokharel commented 11 months ago

Hi @patricoferris, I made some changes to the code especially the location of distance functions inside V module. Can you review this?

Regarding other comments:

Naming is still the same, will change after the structure is set
about using x0 instead of gete, what I am actually doing is setting up the function for one axis. For example: if I was to implement distance for 3D I would just have to add gete 2. If this doesn't make sense to you, we can discuss.

kushalpokharel commented 11 months ago

Hi @patricoferris, I made some changes to the code especially the location of distance functions inside V module. Can you review this?

Regarding other comments:

Naming is still the same, will change after the structure is set

about using x0 instead of gete, what I am actually doing is setting up the function for one axis. For example: if I was to implement distance for 3D I would just have to add gete 2. If this doesn't make sense to you, we can discuss.

Hi @patricoferris, can you provide some feedback on this?

patricoferris commented 11 months ago

about using x0 instead of gete, what I am actually doing is setting up the function for one axis. For example: if I was to implement distance for 3D I would just have to add gete 2. If this doesn't make sense to you, we can discuss.

I don't quite understand, don't you already use gete 2 in this 2D case ?

let gete ind = match ind with 0 -> x0 | 1 -> y0 | 2 -> x1 | _ -> y1 in

kushalpokharel commented 10 months ago

about using x0 instead of gete, what I am actually doing is setting up the function for one axis. For example: if I was to implement distance for 3D I would just have to add gete 2. If this doesn't make sense to you, we can discuss.

I don't quite understand, don't you already use gete 2 in this 2D case ?
let gete ind = match ind with 0 -> x0 | 1 -> y0 | 2 -> x1 | _ -> y1 in

Sorry for the late reply, I missed the notification for this. I am using gete inside of the functions like dist (in mindist). mindist function is calculated by identifying the nearest line of the bounding box parallel to x-axis and similarly the same process is done for the lines parallel to y-axis and if we work in higher dimension, similar is done to other axes. Hence, I want to get the coordinates according to the axis that I am working on. For example: For minimum distance from point to rectangle: I do mindist 0 +. mindist 1. Now mindist is parametrized by i (the axis we are working on), the gete will give us the points relating to that axis. Does this make sense to you?

kushalpokharel commented 10 months ago

@patricoferris, I have made some variable name changes and deleted the file. Please review these changes.

geocaml / ocaml-rtree

[Nearest Neighbor] Code and basic test for nearest neighbor #28