About the fMoW dataset and difference with SatMAE

bourcierj commented 9 months ago

Hello,

I was trying to reproduce your results on fMoW, and noticed some differences with prior work and SatMAE in particular. I haven't found details about these differences in the paper or the repo.

You are using the "raw" images from the fMoW-RGB product, i.e. you do not preprocess the images using the official baseline code, as done in SatMAE.
Moreover, your split files are very different: there is a smaller number of images in the train and val splits, your train split has 84.9k images while the SatMAE train split has 363.4k images.
Also, you are mixing the "RGB" and "MSRGB" images in your splits, which is not done in SatMAE.

I am hoping that you could clarify why you opted for such different choices, and how this has impacted the comparison with SatMAE results.

Thank you.

RitwikGupta commented 8 months ago

Hey @bourcierj!

Apologies for my delay in response as I have been on international travel. Good questions, here are the answers.

To start with, please ignore the splits for FMoW in the splits directory. We do not use that split file. I will be removing them from the repo.
We use the entire FMoW train and test split. In our code, we pass in the entire folder path and hence use the ImageFolder, not ImageList, implementation. The spurious split files are causing the confusion. https://github.com/bair-climate-initiative/scale-mae/blob/main/mae/config/fmow.yaml#L5
We are not mixing the RGB and MSRGB images. We only use all of the RGB images. Again, this confusion is caused by spurious split files in the repo. https://github.com/bair-climate-initiative/scale-mae/blob/main/mae/dataloaders/fmow.py#L13
The only pre-processing done by the FMoW baseline is the chipping of images and the conversion of metadata to feature vectors. Our code is equivalent to theirs in that we do not change the pixel values of the loaded images.

Our comparison to SatMAE is identical.

Thanks! Ritwik

RitwikGupta commented 8 months ago

https://github.com/bair-climate-initiative/scale-mae/commit/89280d830037ff27c20459cdab03e01e633e29bb

bourcierj commented 8 months ago

Thanks for the reply @RitwikGupta ! This explains why my run yields incomparable results (using the spurious split files for train and val, I got much worse linear probe performance than yours and than SatMAE's). Will retry with the right splits.

bair-climate-initiative / scale-mae

About the fMoW dataset and difference with SatMAE #8