LabeliaLabs / distributed-learning-contributivity

Simulate collaborative ML scenarios, experiment multi-partner learning approaches and measure respective contributions of different datasets to model performance.
https://www.labelia.org
Apache License 2.0
57 stars 12 forks source link

model.compile in multi_partner_learning.py: datasets/dataset_*.py can't create model with binary_crossentropy #181

Closed GauthierGar closed 3 years ago

GauthierGar commented 4 years ago

In datasets/dataset_cifar10.py and mnib.py, generate_new_model_for_dataset() ends up defining the way the model is compile: compile( optimizer='rmsprop', loss=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, **kwargs ) In multi_partner_learning.py, in build_model_from_weights(): After using generate_new_model_for_dataset(), the model is then recompile: new_model.compile(loss=keras.losses.categorical_crossentropy, optimizer="adam", metrics=["accuracy"])

It's then not allowed to create different types of model, I have found that trying to add the imdb dataset, the models are often compile with a binary_crossentropy loss function.

GauthierGar commented 4 years ago

build_model_from_weights(): https://github.com/SubstraFoundation/distributed-learning-contributivity/blob/0884d52f8df2d7e85a1028e15acb3f3cc8bef8bb/multi_partner_learning.py#L439-L450

generate_new_model() is a method from the dataset.py module: https://github.com/SubstraFoundation/distributed-learning-contributivity/blob/0884d52f8df2d7e85a1028e15acb3f3cc8bef8bb/dataset.py#L50-L52

generate_new_model_for_dataset() comes from the definition of the module (made by the user), in one of the datasets/dataset_ modules: https://github.com/SubstraFoundation/distributed-learning-contributivity/blob/0884d52f8df2d7e85a1028e15acb3f3cc8bef8bb/datasets/dataset_cifar10.py#L45-L78

Seems that, even if the model is compile a first time in generate_new_model_for_dataset() with this optimizer: # initiate RMSprop optimizer opt = keras.optimizers.RMSprop(learning_rate=0.0001, decay=1e-6) Due to build_model_from_weights(): the model will be re-compile with the adam optimizer

GauthierGar commented 4 years ago

I linked this issue with the project Make it agnostic to datasets and models, as it seems to be a consequence of the repo being first dev with the mnist dataset.

GauthierGar commented 4 years ago

I have created a google colab to check that binary CE and categorical CE can be used on the imdb dataset. We just need to pre-process differently the inputs and labels. https://colab.research.google.com/drive/1P0iMWJ0QwsFAHJYwRdVhx9O25xrUVQka?usp=sharing

RomainGoussault commented 4 years ago

The imdb dataset is a binary classification problem ? if so, you can stay with the binary cross-entropy.

But anyway we will need in the future to be able to parametrize in the dataset module the cost function (and maybe the optimizer) for each dataset.

And good catch to spot the two different optimizers (adm and RMSprop !) 👍

bowni commented 4 years ago

@arthurPignet for reference when you'll tackle the end of integrating IMDB 😃

bowni commented 3 years ago

Done