cc-ai / climategan

Code and pre-trained model for the algorithm generating visualisations of 3 climate change related events: floods, wildfires and smog.
https://thisclimatedoesnotexist.com
GNU General Public License v3.0
75 stars 18 forks source link

Evaluate simclr performance #95

Closed alexrey88 closed 4 years ago

alexrey88 commented 4 years ago

So here's the issue: we need to have a good way to figure out whether pretraining the encoder with simclr is better than the pretrained Deeplabv2. Many options are out there...

  1. I would begin by just comparing the masker's results of the actual BaseEncoder pretrained with simclr and the Deeplabv2 encoder pretrained on Cityscapes. If the results are better with simclr, then it's clear that we should use it and we would have a lighter encoder. If the results are worse, then we could:

  2. Have the same number of parameters for both encoders (BaseEncoder pretrained with simclr and Deeplabv2 pretrained on Cityscapes), which would imply adding layers to BaseEncoder, train the masker with both and compare the results. If the results are similar or better with simclr, then we should use it with the BaseEncoder. If the results are worse, then we could:

  3. Pretrain Deeplabv2 using simclr and our data (do we pretrain it from scratch or from the pretrained one on Cityscapes?), and then compare the masker's results with the pretrained Deeplabv2 on Cityscapes only. If the results are better with simclr, then we keep Deeplabv2 but we add the simclr pretraining.

And to compare the masker's results, what metrics should we use? And do we train the masker while freezing the encoder or fine-tuning it?

So, what's your thoughts? :)

melisandeteng commented 4 years ago

Big question here... For the masker's results evaluation, first step would be to evaluate with IoU and pixel accuracy on our hand labeled images I guess. Maybe we can also use some boundary evaluation metrics in a next step.

theo2021 commented 3 years ago

Did you do the testing? Was SimCLR better at the end?

vict0rsch commented 3 years ago

Hello @theo2021 no it was not. But it's not so surprising given our very specific setup and it's difficult to evaluate anyway so this conclusion is not transferable to anything else I would say