Initial reproduction of table 1 from [Wang 2022]

dgcnz commented 1 month ago

Description

[Wang 2022] tests in Figure 4 the effect of altering data equivariance error on the accuracy, for multiple models (equivariant, non-equivariant and relaxed equivariant) on the Smoke Plume dataset. This code is available here. We can use this as a starting point to test the performance of the equivariant and relaxed equivariant models on fully equivariant data (for the smoke plume dataset).

Tasks

[x] Follow instructions to run the run_equiv_test script (as is), use the precomputed Smoke dataset
[ ] Create gh issues to cleanly port the SmokePlume dataset and rsteer, e2cnn and convnet models into our repo

Expected outcomes

[x] Q1: Does the script replicate Figure 4/Table 1? (at least for rsteer, e2cnn and convnet)
[x] Q2: How large is the dataset? Is this large enough to be representative?
[x] Q3: What is the parameter count of the compared models? Is it equivalent?

MeneerTS commented 1 month ago

I made a notebook called figure_4_wang_2022_repro.ipynb that does this. Currently running it in colab.

MeneerTS commented 1 month ago

Results

Model | MSE ConvNet 0.11068 E2CNN 0.11894 RSteer 0.27274

These are all higher than in the figure. The checkpoints can be found under notebooks/figure_4_checkpoints. They include checkpoints made at 25%, 50%, 75% and 100% of the total number of epochs (which changes per model since early stopping is used)

Early stopping epochs: Steerable 114 E2CNN 66 CNN 53

MeneerTS commented 1 month ago

Paramater counts

Convnet 113666 E2CNN 297216 RSteer 671234

They are in the same order of magnitude, but still quite a bit apart.

MeneerTS commented 1 month ago

Dataset Size

The dataset for a given equivariance level consists of 4 different simulations of 40 timesteps. These are then split up into 30 training and 10 validation timesteps. These are not enough datapoints to be representative.

dgcnz / dl2