daehyeon-han / qpn-simvp

Precipitation nowcasting using ground radar data and simpler yet better video prediction deep learning
4 stars 2 forks source link

Would it be useful to pretrain on larger amount of data and finetune on the small private dataset? #1

Open Crestina2001 opened 1 month ago

Crestina2001 commented 1 month ago

Thanks for your great work!

May I ask about the pretraining-finetuning paradigm? Will it be useful to pretrain on similar datasets, like Sevir, ERA-5, etc, with different spatial and temporal resolution, and then fine-tune on the small dataset. In that case, which model would be the best bet? I guess ViT-based model have a better transfer learning ability, but requires a larger scale of pretraining(on web-scale data), and I don't have compute to do mass pretraining.

daehyeon-han commented 1 month ago

Hi, thank you for the comment!

Technically, SimVP structure can also be used for transfer learning with pretext learning and fine-tuning. Note that SimVP might take different data sizes under certain conditions - specifically, when the size is a multiple of 2^n, where n is the number of upconv/deconv layers. The original structure of SimVP only accepts the same number of temporal sequences for input/output data, which cannot be changed once the model is defined. In short, if you have the same data size, you can use different spatial/temporal resolutions for fine-tuning.

As I have not tested its transferability, I cannot guarantee that it will work well. However, given that SimVP generally performs better than U-Net in spatiotemporal prediction, I expect it will perform similarly or better than U-Net in transfer learning in forecasting. Therefore, please check the performance of U-Net, which is commonly used as a baseline. You can then consider adopting SimVP for the pretext and fine-tuning stages. I have not used ViT yet, so I cannot comment on its performance for fine-tuning.

I am also a learner of deep learning. Happy to exchange any ideas on GeoAI. Thanks.

Crestina2001 commented 1 month ago

Hi, thank you for the comment!

Technically, SimVP structure can also be used for transfer learning with pretext learning and fine-tuning. Note that SimVP might take different data sizes under certain conditions - specifically, when the size is a multiple of 2^n, where n is the number of upconv/deconv layers. The original structure of SimVP only accepts the same number of temporal sequences for input/output data, which cannot be changed once the model is defined. In short, if you have the same data size, you can use different spatial/temporal resolutions for fine-tuning.

As I have not tested its transferability, I cannot guarantee that it will work well. However, given that SimVP generally performs better than U-Net in spatiotemporal prediction, I expect it will perform similarly or better than U-Net in transfer learning in forecasting. Therefore, please check the performance of U-Net, which is commonly used as a baseline. You can then consider adopting SimVP for the pretext and fine-tuning stages. I have not used ViT yet, so I cannot comment on its performance for fine-tuning.

I am also a learner of deep learning. Happy to exchange any ideas on GeoAI. Thanks.

Thanks for your reply.

Size is not a big issue, because we could crop an area of a fixed size from the center.

But SimVP does not work on my custom dataset. It can work on Moving MNIST, but fails to work on the reflectivity dataset.

Here is the config(for moving mmnist, the height and width would be different for the reflectivity dataset):

{ "alpha": 0.1, "hid_S": 64, "hid_T": 512, "N_S": 4, "N_T": 8, "mlp_ratio": 8.0, "drop": 0.0, "drop_path": 0.0, "spatio_kernel_enc": 3, "spatio_kernel_dec": 3, "width": 64, "height": 64, "channel": 1, "num_objects": [ 2 ], "scaling_factor": 1.0, "n_frames_input": 10, "n_frames_output": 10, "lr": 0.001, "warmup_lr": 1e-05, "warmup_epoch": 0, "epochs": 200, "patience": 5, "micro_batch_size": 16, "macro_batch_size": 16, "weight_decay": 0, "min_lr": 1e-05, "dataset_name": "mmnist", "algorithm_name": "simvp" }

I have also tuned the lr(make it 1e-4, 1e-5), but it does not work. It learns a mapping from the input to the output, like this:

Input: Frame 1, Frame 2, ..., Frame 10 Output: Frame 1, Frame 2, ..., Frame 10

It looks like the algorithm gets stuck in a bad optima and cannot get out.

I have also tried TAU, and TAU can overcome this issue(It does learn some rules, even though the results are still far from satisfactory).

Could you give any suggestions of solving this issue?

Many thanks !