gaozhihan / PreDiff

[NeurIPS 2023] Official implementation of "PreDiff: Precipitation Nowcasting with Latent Diffusion Models"
Apache License 2.0
106 stars 4 forks source link

The model performance is greatly affected by long-tailed dataset #31

Open earthpimp opened 6 months ago

earthpimp commented 6 months ago

Sorry to bother you. I trained Prediff on my own dataset and found the result quite bad. I guess the reason behind should be data imbalance which is commonly observed in precipitation nowcasting. I am currently considering to do resampling but I am worrying that it might hurt the generalizability. I noticed that in your previous paper regarding TrajGRU, pixelwise loss weighting is applied to the radar sequence. How could I implement a similar approach in Prediff. I would be appreciated if you could offer some suggestions.

gaozhihan commented 6 months ago

Thank you for your interest in our work and your question. Yes, we encountered a similar challenge when training PreDiff on highly imbalanced data, such as the HKO-7. Although it is not feasible to directly implement a loss function like in TrajGRU that can intuitively balance the training towards data in a long-tail distribution, we found that adjusting the data sampling directly can help alleviate this problem. Specifically, we increase the sampling of rare data and decrease the sampling of common data.