flowersteam / explauto

An autonomous exploration library
http://flowersteam.github.io/explauto
GNU General Public License v3.0
64 stars 27 forks source link

What is the logic behind competence_dist? #72

Open jgrizou opened 8 years ago

jgrizou commented 8 years ago

Just looked into the competence_dist function: https://github.com/flowersteam/explauto/blob/master/explauto/interest_model/competences.py#L4

def competence_dist(target, reached, dist_min=0., dist_max=1.):
    return (min(- dist_min, - np.linalg.norm(target - reached))) / dist_max

I don't get this division by dist_max, what is it meant for? Also why is the division not casted in float? It results in very weird behavior like:

competence_dist(0,0,2,10.0) -> -0.2 (expected -2 ?) competence_dist(0,0,2,10) -> -1 (expected -2 ?) competence_dist(0,100,0,1) -> -100 (expected -1 ?)

I would have assumed that we rather would like to bound the error between dist_min and dist_max, isn't it? But I might be missing something deeper here.

The patch would be:

def competence_dist(target, reached, dist_min=0., dist_max=1.):
    return max((min(- dist_min, - np.linalg.norm(target - reached))), -dist_max)

I also assume the move to negative domain is because competence should improve when error reduces.

jgrizou commented 8 years ago

Just seeing this really recent fix: https://github.com/flowersteam/explauto/commit/d5f145c17559a0bcbf1a430acb1c1b2e398a0b1a#diff-f4e1728b655e961deda40852dc02ad27

@sebastien-forestier do I get this right? This seems to be quite a problem at the core of all expeirments

clement-moulin-frier commented 8 years ago

Hi @jgrizou

Seems a bit weird indeed. I guess the logic was to be able to bound the value with a min and and max, as well as to normalize it (hence the division, although it's not doing the job properly as shown in your examples).

Your patch makes sense to me, feel free to commit it. It would be nice to test whether the notebooks using that function is still returning consistent results. I guess so since it is usually called with the default values for dist_min and dist_max.

A useful feature could also be having a boolean argument normalize that ensures the returned value is between 0 and 1 when set to True. But I don't think we need it yet, so we can add it later.

The move to the negative domain is indeed because competence should improve when error reduces.

jgrizou commented 8 years ago

Hi @clement-moulin-frier

Thanks, this makes sense to me. The recent fix from @sebastien-forestier (https://github.com/flowersteam/explauto/commit/d5f145c17559a0bcbf1a430acb1c1b2e398a0b1a#diff-f4e1728b655e961deda40852dc02ad27) implements the bounds but not the normalization.

I guess normalization makes it more robust to different domains. I plan to use interest tree, @sebastien-forestier given your implementation of it, would normalizing or not impact the dynamic behavior/splitting of the tree?

sebastien-forestier commented 8 years ago

Hi @jgrizou @clement-moulin-frier,

Indeed the formula was a bit strange so I put only bounds, and the normalization is done in the discrete_progress class here.

Now that I see it, the default dist_max is 1. in the formula, but I guess it would be better to have dist_max = np.inf instead, as normal examples in tutorials might have distances larger than 1.

In the case of the Tree interest model (which is still experimental!), the measure used is currently the competence_exp which already normalizes.

I recently spent some time on trying to make the discrete progress interest model work better than random goal babbling in a standard 2D goal space. I ran into the problem of the unreachable but close to reachable cells. In those cells, no progress is really made, but the goal-to-reached distance randomly fluctuates between 0 and cell_size, inducing a high interest. As those cells might be the majority, this could not be a normal behavior. I finally came up with this: give 0 competence if the reached point is not in the same cell as the goal, and add a novelty bias to explore newly reached cells (on top of a small probability to explore a random cell).

I am currently working on a notebook about different algorithms, where you can find the example of Active Goal Babbling.

Regarding the utility of normalization, I guess that makes sense to be able to compare different spaces of same cardinality but different bounds. When it comes to different cardinalities, then the maximal distance scales with square root of the cardinality, but the volume of learnable space might scale much faster, so I think there is in this case no good general normalization.

In the same notebook, you can also find an example of Active Model Babbling where the different spaces have the same bounds and cardinality (where a self-organized curriculum appears: learn movements of the hand, then of a tool, and then of a learnable object, while the interest of random objects stays low).

jgrizou commented 8 years ago

Mail from @oudeyer:

Hi,

these variants are all interesting, and probably there is not one better than the other in general (but some form of normalization is needed when the goal space is much larger than the reachable space). However, what is crucial in the library is that what is implemented is 1) well documented in the explauto doc with mathematical formulas 2) mapped to actual algorithms used in cited papers. There could be several alternative measures for computing competence and/or progress, but the default one should be one studied in a paper even if it is not the best. Then, the other alternatives you suggest below shall be the topic of study of a section in a paper, or a full paper. Actually, as we are now seeing intrinsic motivation/curiosity papers multiply, especially in ML/reinforcement learning conferences, many of these papers are actually focusing on introducing and studying a specific new measure of information gain/competence progress. As papers were rare in the past, such detailed focus was not so crucial, but now people look at this and papers at NIPS/ICML base all their argumentation on things like comparing a measure of information gain with or without normalization, or with L1 or L2 norm. So when these people come to know Explauto (I guess we would like they become users of it), they will like to know precisely what are the measures implemented and what are their properties,

Best, Pierre-Yves

jgrizou commented 8 years ago

Also the default value for dist_max should now probably be np.inf