IRT-SystemX / ml4physim_startingkit

11 stars 2 forks source link

Codabench: Evaluation of a custom model #6

Closed MaximeLee closed 10 months ago

MaximeLee commented 10 months ago

When evaluating a custom model with codabench, I am assuming that you will use the AirfRANSEvaluation object to do the evaluation. Here is an example of its use in the forth notebook:

metrics = evaluator.evaluate(observations=observations,
                             predictions=predictions,
                             observation_metadata=observation_metadata)

How will be computed the observations/predictions?

Other subquestions are:

Mleyliabadi commented 10 months ago

Hi,

The (observations, predictions) pair are returned by the predict function as:

predictions, observations = predict(model, benchmark._test_dataset, device=device)

Where the benchmark._test_dataset is the test dataset. The observations are real data and the same as the test set and the predictions are the outputs of your model at the inference step.

Concerning your sub-questions:

MaximeLee commented 10 months ago

Even if the scaler is defined it will not be involved in the evaluation?

Also which function will be used for the evaluation in codabench (for custom models)?

Note: I forgot to precise that I want to evaluate a Tensorflow model.

daviddanan commented 10 months ago

I think there is a slight confusion between the responsabilities of the evaluation module and the evaluate_simulator method in the benchmark class.

Therefore, just in case, i would like to clarify

As to, how the scaler is handled, i think this line in the predict method may answer to your question. You can find an implementation of the corresponding method in section II of the fourth notebook for Pytorch.

Now, regarding tensorflow, another issue was raised on this matter. Although, we did not provide any notebook for Tensorflow use as of now, the evaluation of a model output is not related to Tensorflow. We are currently investigating on the feasability with tensorflow support for the competition. Closing the issue until then.