Open jgrizou opened 8 years ago
One effect of the grid system is that you do not always have an observation in each cell, so when you ask for 100 test_cases, you often endup with less.
Another method could consist of using KMeans, with k = number of test_cases to find cell centers. Then find the closest observation from the cluster center. This ensures you get 100 test_cases if you ask for 100.
However, this is not really uniform, yet it is a good approximation for k<<n_samples. And the code already generates 100 times more samples than testcases: observations = uniform_motor_testcases(robot, 100*n)
.
Below is a small example, data in blue (1000 points), Kmean in red (20 points), selected in green (20 points).
Here is a comparison between the two methods:
Dataset 1000 points.
Grid: ask for 20 points, got 18. Selected in magenta (18 points)
Kmeans: ask for 20 points, got 20. Kmean in red (20 points), selected in green (20 points)
There is a pool of point at in the bottom-left corner for failed experiment, so it is normal that a sample is selected there.
Resolution was automatically computed with the formula in post 1, it gave 5 for this. so I guess a 5x5 grid, which is 25 cells, out of which only 18 were populated. Kmeans does look less uniform.
I think I will stick with the k-means because it ensures n-points. But it is not optimal.
What we really want here is a kind of SOM with a constraint that the vectrice should be of similar length. (Scaling the data between 0 and 1 in each dimension beforehand).
I am now looking into the testcases generation. I find the method to generate uniform testcases over the sensory space very appealing. Code is here: https://github.com/flowersteam/explauto/blob/master/explauto/environment/testcase.py
My understanding is that a grid of a given resolution is projected on the sensory space, and that each cell is associated with only one observation from within that cell. My questions concern the resolution parameter:
resolution = max(2, int((1.3*n)**(1.0/len(robot.s_feats))))
?I also noticed this:
# TODO : change obs only if nearer from center of coo.
From what I understand is that in each cell, the corresponding observation will be the last observation encountered in the _populate process. The todo is to replace that by keeping the closest to the center of the cell?