Open pierre-rouanet opened 10 years ago
I was about to open the same issue :) I spent a couple of hours trying to find out how it is made in the code and still I am not sure I understand. Here are a couple (a lot) of questions:
Just pinging everyone here :) @sebastien-forestier @pierre-rouanet @clement-moulin-frier
The 'delta' functionality was added to allow the learning of actions depending on contexts i.e. learning the forward model (m, s, dm) -> ds and inverse model (m, s, ds_goal) -> dm. I've written a notebook about that here. If we use delta actions, the evaluation is not the same (thus the 'delta' evaluation mode).
So I think giving mode as an argument to 'evaluate_at' is a good idea, with the current behavior as the default behavior.
Hi @jgrizou ,
Sorry for the late reply. Trying to answer your questions (in the same order you used):
1) Yes
2) Yes
3) See @sebastien-forestier answer,
4) It returns the errors.
5, 6, 7) At that time we only evaluated our models for inverse prediction, we were not concerned by forward ones. So we indeed didn't test well the forward evaluation. It's probably for this reason that there is no mode
argument and it is certainly a good idea to add it.
8) I would rather code a method in your specific Environment
class that generates the test set, then passing this test set to Evaluation
as usual (see also my last paragraph below)
9) Hard to remember and this is weird indeed :/ I would rather use the one in Environment
. Or test both to see if they are equivalent.
In summary, we have never been very happy with how the evaluation is coded, but never came with a better solution neither. Don't hesitate to recode this if you feel inspired, ideally trying to keep back-compatibility.
The fact is that the way one wants to generate testsets can be very different according to the situation: one might want to use a predefined testset, another one to generate it by calling the Environment
, or another one using geometrical calculation (as I remember it was the case for a very high-dimensional arm where the best guess was that reachable points are within the unit circle). This is why we thought the best input to Evaluation
is simply an array of test points, leaving the responsibility to generate it to the user.
Also warns (ValueError?) user when they want to use the same test cases with different environments.