Open liluo2 opened 2 months ago
Thanks for your interest!! For context, PDEArena used a (very) old version of the shallow-water model (v0.2.1) that was still quite experimental, and I do not recommend using that data as it does not represent physically sensible initial conditions see #377 - we were not consulted in the creation of this data set. See examples in our documentation for better defined problems. SpeedyWeather's shallow water model can be used to simulate a variety of problems. Quasi-linear gravity waves, planetary waves like Kelvin/Rossby, but also 2D turbulence meaning chaos, mixing and a strong dependence on intitial conditions, or forced problems converging to some climatology superimposed by internal variability. Depending on what you want to simulate I'd carefully choose a model setup. Let me know what you want to do and I'm happy to figure out with you a model setup that simulates what you want to use for training.
div
is the divergence of the velocity field, i.e. $\nabla \cdot \mathbf{u}$, see here https://speedyweather.github.io/SpeedyWeather.jl/dev/shallowwater/#Shallow-water-equations, vorticity and divergence are the prognostic variables in SpeedyWeather, not velocities $u, v$ which are derived from vorticity divergence, see also https://speedyweather.github.io/SpeedyWeather.jl/dev/barotropic/#Barotropic-vorticity-equation
You'd need to dig deep in this repository to find out what we used in v0.2.1, as I said that was a much more experimental version. Nowadays we just use NaN for the _FillValue
, i.e. for Float32 output (default) it's 0x7fc00000
. I don't know where the number that you mention comes from. In general there are no masked values in SpeedyWeather output as our fields are defined everywhere on the globe, not like ocean models need to deal with continents. So generally you should be able to just ignore the fill/missing values and treat the data as is.
The issue you provided is a good starting point, but also to point you to the mass conservation in the documentation: https://speedyweather.github.io/SpeedyWeather.jl/dev/analysis/#Mass-conservation and #323, #400. It depends a little bit on what you mean by "verification", in #372 you see how one could recalculate both sides of the continuity equation from simluation output. In general you get a higher accuracy of that verification if you calculate things as close as possible to how it's done internally, which is described here https://speedyweather.github.io/SpeedyWeather.jl/dev/shallowwater/#Algorithm. In short, we calculate $\mathbf{u}h$ in grid-point space, transform to spectral space, take the divergence, use this as tendency for the prognostic variable $\eta$ which is solved using a leapfrog time integration with RAW filter (see https://speedyweather.github.io/SpeedyWeather.jl/dev/barotropic/#leapfrog)
Wow! Thanks to your timely reply! Acturally I don't know very much about your model and the way you generate ShallowWater dataset. I really want to take a deep dive into your brillian work, but there's a ddl here and I might do it later. Nevertheless, I'll try my best to understand this lol.
div
are almost zero everywhere and everytime (with a very small value on time behind). Of curiousity, I want to know if there's a coincidence or a reason behind it? >>> nf.variables['u']
<class 'netCDF4._netCDF4.Variable'>
float32 u(time, lev, lat, lon)
units: m/s
missing_value: -999999.0
long_name: zonal wind
unlimited dimensions: time
current shape = (88, 1, 96, 192)
filling on, default _FillValue of 9.969209968386869e+36 used
dict_keys(['time', 'u', 'v', 'vor', 'div', 'pres', 'lon', 'lev', 'lat'])
, which doesn't have H
and orography
here. I observe lev
as vertical model levels
but the unit is 1
. So I don't know how can I get $\eta$ here? (Sadly, I don't find the v0.2.1 documentation since the documentation is at least v0.5.0). Again, very much thanks to your work and your powerful support! I really appreciate it!
I haven't noticed that PDEArena used a very old version of ShallowWater. Generating a new dataset matters for both theoretical and practical meaning. Are there any shallow water dataset that available for download? Or I need to generate it by myself?
You would need to generate it yourself, this repository only provides the model, we don't host datasets.
Got it. I also noticed here https://speedyweather.github.io/SpeedyWeather.jl/dev/shallowwater/#Shallow-water-equations. It is just that, as I checked its value, I found that
div
are almost zero everywhere and everytime (with a very small value on time behind). Of curiousity, I want to know if there's a coincidence or a reason behind it?
I don't know what you mean by "almost zero". Yes, both atmospheric and oceanic motion is often very close to geostrophy which makes it "almost" divergence free, meaning that the flow is parallel to isobars. There's a section in the documentation talking about analysing geostrophy in a shallow water simulation:
https://speedyweather.github.io/SpeedyWeather.jl/dev/speedytransforms/#Example:-Geostrophy
It seems that as I switch to another model, the training is much more stable. It's strange and maybe I need to take a look at the weights at the middle because the same issue happened before. The number comes from here:
>>> nf.variables['u'] <class 'netCDF4._netCDF4.Variable'> float32 u(time, lev, lat, lon) units: m/s missing_value: -999999.0 long_name: zonal wind unlimited dimensions: time current shape = (88, 1, 96, 192) filling on, default _FillValue of 9.969209968386869e+36 used
It seems that in that dataset the missing value is set to -999999. I don't know why the python netCDF4 library wants to use 9.9...e+36 instead as indicated in the last line. In any case there's no missing data, you can just throw away the mask.
The dataset I'm using now has the following keys
dict_keys(['time', 'u', 'v', 'vor', 'div', 'pres', 'lon', 'lev', 'lat'])
, which doesn't haveH
andorography
here. I observelev
asvertical model levels
but the unit is1
. So I don't know how can I get η here? (Sadly, I don't find the v0.2.1 documentation since the documentation is at least v0.5.0).
pres
is $\eta$, we use pres
as the universal name, because our primitive equation models also have a pressure that's an actual pressure and not in units of meters as is $\eta$ but conceptually they are the same. I believe H=8500m but you would need to check what the PDEArena people did, that's a parameter you can choose. We calculate $h$ as outlined e.g. here https://speedyweather.github.io/SpeedyWeather.jl/dev/analysis/#Energy
This might be a numerical calculation problem.
Yes it is.
What's the difference between spectral method and finite difference method on space? (Here I just calculate the spatial derivatives by Mi+1−Mi−1/2Δx). My picture for ∇⋅(uh) using finite difference method is very similar with the picture you provided. I don't know if you have some ideas for the comparison of these two errors. Still I also need to dig more into the issues you mentioned. If I got some new ideas I'll comment here.
Centred finite differences as you outline are 2nd order accurate. But it gets trickier here because of the spherical geometry, so you have to take into account the curvature of the grid. SpeedyWeather is a spectral model meanig it does finite differencing only in time and in the vertical (not relevant in the 2D shallow water equations), other gradient operations in space are not done in grid-point space but after a transform to spectral space, see https://speedyweather.github.io/SpeedyWeather.jl/dev/spectral_transform/#Derivatives-in-spherical-coordinates
This way they are effectively much more accurate (the transform means you take all grid points into account not just the local neighbours like in finite difference, volume or element methods) and the dominating error is the transform error not the discretization error. This means that ideally you should verify the divergence as we do it in SpeedyWeather too, if you use finite differences you will inevitably have a larger discretization error. The #372 issue is a good starting point.
@liluo2 Also feel free to ask any dataset related questions on PDEArena itself. Apologies @milankl, I really should add an upgraded datagen script there that works with the latest improvements in SpeedyWeather. Just too many other things have taken up the time lately.
Awesome, yes! There's more and physically motivated examples we can produce data for to be part of PDEArena, maybe we should just have a chat one day on what data you'd like to have, what would be good for an ML challenge! Let me know
@rejuvyesh It can't be better than that! Thank you Jayesh and Milan! I'm still working on it. If there are more problems related to ShallowWater itself I'll comment here ;-)
Hi! It's a really brilliant work!
I'm working on embedding physical constraints into deep learning. I found your work widely used in deep learning community ((https://arxiv.org/abs/2302.06594), etc.), so I'm trying to playing with ShallowWater dataset this paper uses as well (https://huggingface.co/datasets/pdearena/ShallowWater-2D) . But I met some troubles :-( So I come here to seek for some help. For the dataset on PDEArena, I'd like to know:
dict_keys(['time', 'u', 'v', 'vor', 'div', 'pres', 'lon', 'lev', 'lat'])
.div
means divergence, but I don't know which field is this divergence applied to? What's more, is there a document for illustrating these output?_FillValue
is used, which is9.969209968386869e+36
, a numerous number. I found that our training always fails with loss suddenly become a tremendous number like3e+22
. I try to search the max number on this dataset but it could cost a very long time... I don't know if this is the reason that acturally have a negative influence on my training.I don't know if you got some ideas. Hope we can have a further conversation. Looking forward to your reply!