autogluon / tabrepo

Apache License 2.0
41 stars 9 forks source link

Add a simple method to fit models on datasets #69

Open Innixma opened 3 months ago

Innixma commented 3 months ago

Related: #55

We should add an interface for users to run a specific model on a specific dataset locally. This will help drive adoption of TabRepo for method papers that are introducing a new model and want to compare against other baselines, similar to how TabZilla is currently being used. The hope is that this feature will do a great deal to resolve the reproducibility / baseline consistency crisis for tabular method papers.

A major benefit of having this logic is that we can incorporate any strong and trusted result of a method into TabRepo's main EvaluationRepository. If someone runs a stronger configuration of a known method, we can either add it alongside the weaker results of a known method or replace the weaker results with the stronger results, depending on what makes more sense. This way we can work to ensure each method in TabRepo is represented by its strongest configuration/search space/preprocessing/etc., greatly reducing the chance methods are misrepresented in terms of their peak capabilities.

Proposal

The fit logic should feature two modes: Basic mode and Simulator mode.

Basic mode doesn't require the user to generate out-of-fold predictions. Therefore the model will not be compatible with TabRepo simulation, but will still be able to be compared to TabRepo results via the test scores. It is important to have a basic mode so that users can avoid doing k-fold bagging if they don't want to. Basic mode should be very similar to what is done in AutoMLBenchmark.

Simulator mode will require the user to additionally produce out-of-fold predictions & probabilities for every row of the training data. We can provide templates to make this easy to do, such as relying on AutoGluon's k-fold bagging implementation or generic sklearn k-fold split. Simulator mode results will be fully compatible with TabRepo, and will allow for simulating ensembles of the user's method with prior TabRepo artifacts.

Requirements:

Model Code

Inputs

Run Artifacts

The resulting artifact should be either an instance of EvaluationRepository or very similar to EvaluationRepository.

General

Simulator Mode

Result Aggregation

Parallelization / Distribution (Stretch)

Ensuring reproducibility (Stretch)

Evaluation

Open Questions

geoalgo commented 2 months ago

Great that you are pushing for this!

A major benefit of having this logic is that we can incorporate any strong and trusted result

True, another big use-case (at least for me) is to be able to quickly see how a method perform on a wide-range of datasets even if the predictions are not included.

Basic mode/Simulator mode

I agree it makes sense to have the option to have only metrics for ease of use. The names may be a bit disconnected with what the modes are, why not just calling the first mode "metric-only" and making clear that ensemble simulations are only supported with model predictions?

Users will need to define their model running code similar to how it is done in AutoMLBenchmark in the exec.py files for frameworks

This could be quite complicated for users. In Tabzilla and in FTTransformer, they provide an example on how to run a simple scikit learn like class, would it be possible to support something like this? I think it would make it much easier for users.

For instance, something like this (just to give the high-level idea):

repo = ...
X_train, y_train, X_test = repo.get_Xy(dataset="Airlines", fold=0)
y_pred = CatBoost().fit(X_train, y_train).predict(X_test)
# output metrics that are comparable with repo.metrics(datasets=["Airlines"], configs=["CatBoost_r22_BAG_L1"], fold=0)
print(repo.evaluate(y_pred))
GDGauravDutta commented 2 months ago

When we can have simpler API like Autogluon , so as have better understanding about this new library.

Innixma commented 2 months ago

@GDGauravDutta We are actively working on this, and a simpler API should be available within the next month.