Codabench: Evaluation of a custom model

MaximeLee commented 10 months ago

When evaluating a custom model with codabench, I am assuming that you will use the AirfRANSEvaluation object to do the evaluation. Here is an example of its use in the forth notebook:

metrics = evaluator.evaluate(observations=observations,
                             predictions=predictions,
                             observation_metadata=observation_metadata)

How will be computed the observations/predictions?

Other subquestions are:

what is fed/predicted to custom model during the evaluation?
should the model have a scaler attribute? and implement some methods to rescale the inputs/ouputs?

Mleyliabadi commented 10 months ago

Hi,

The (observations, predictions) pair are returned by the predict function as:

predictions, observations = predict(model, benchmark._test_dataset, device=device)

Where the benchmark._test_dataset is the test dataset. The observations are real data and the same as the test set and the predictions are the outputs of your model at the inference step.

Concerning your sub-questions:

The features are fed to a custom model during the evaluation ("x-position","y-position","x-inlet_velocity","y-inlet_velocity","distance_function","x-normals","y-normals"). You can access the corresponding data using benchmark._test_dataset.data. And the required outputs are predicted by the model ("x-velocity","y-velocity","pressure","turbulent_viscosity"). These features are indicated using a configuration file here. In this configuration file, attr_x indicates the inputs of the model and attr_y indicates the outputs of the model.
The implementation of the scaler is optional. An example of an implemented scaler is provided in LIPS repository here. An example of its usage is provided in the third notebook of starting_kit

MaximeLee commented 10 months ago

Even if the scaler is defined it will not be involved in the evaluation?

Also which function will be used for the evaluation in codabench (for custom models)?

Note: I forgot to precise that I want to evaluate a Tensorflow model.

daviddanan commented 10 months ago

I think there is a slight confusion between the responsabilities of the evaluation module and the evaluate_simulator method in the benchmark class.

Therefore, just in case, i would like to clarify

The first one evaluate the performance of a given model output with respect to the ground truth solution. In other words, this module is model-agnostic (and thus scaler agnostic), it does not know anything about the underlying model that was used
The second evaluate the performance of your model. There are two steps: compute the prediction using your model and evaluate the performance of a given model output (see previous point). If you have a scaler, it will be of use only in the first step, during the prediction.

As to, how the scaler is handled, i think this line in the predict method may answer to your question. You can find an implementation of the corresponding method in section II of the fourth notebook for Pytorch.

Now, regarding tensorflow, another issue was raised on this matter. Although, we did not provide any notebook for Tensorflow use as of now, the evaluation of a model output is not related to Tensorflow. We are currently investigating on the feasability with tensorflow support for the competition. Closing the issue until then.

IRT-SystemX / ml4physim_startingkit

Codabench: Evaluation of a custom model #6