Open jpolz opened 4 days ago
Hi, can't you take the max looping over all len(cf.fields[i][1])
instead? In principle we designed the architecture to be flexible with the number of levels per fields, but it's something that needs to be checked and validated again. Do you mind doing a quick check and eventually report the stack trace?
thanks!
The required setup would be train multi with 2 or more fields with different numbers of levels, right? I also believe that I'd need to set it to max([ len(f[2]) for f in cf.fields])+1
from my testing experiences, although I don't understand why the +1 is necessary
I used max([ len(f[2]) for f in cf.fields])+1
now and running train_multi.py with this config works (run on wandb):
cf.fields = [
[ 'velocity_u', [ 1, 1024, ['velocity_v', 'temperature'], 0 ],
[ 96, 105, 114, 123, 137 ],
[12, 3, 6], [3, 18, 18], [0.5, 0.9, 0.2, 0.05] ],
[ 'velocity_v', [ 1, 1024, ['velocity_u', 'temperature'], 1 ],
[ 96, ],
[12, 3, 6], [3, 18, 18], [0.5, 0.9, 0.2, 0.05] ],
[ 'specific_humidity', [ 1, 1024, ['velocity_u', 'velocity_v', 'temperature'], 2 ],
[ 96, 105, 114, ],
[12, 3, 6], [3, 18, 18], [0.5, 0.9, 0.2, 0.05] ],
[ 'velocity_z', [ 1, 1024, ['velocity_u', 'velocity_v', 'temperature'], 3 ],
[ 96, 105, 114, 123 ],
[12, 3, 6], [3, 18, 18], [0.5, 0.9, 0.2, 0.05] ],
[ 'temperature', [ 1, 1024, ['velocity_u', 'velocity_v', 'specific_humidity'], 3 ],
[ 96, 105, ],
[12, 3, 6], [3, 18, 18], [0.5, 0.9, 0.2, 0.05], 'local' ],
]
Running train.py with this config also works (https://wandb.ai/atmorep/stratorep/runs/ugkgfp2f/overview):
cf.fields = [ [ 'temperature', [ 1, 1024, [ ], 0 ],
[ 23, 29, 41, 53, 60, 96, 105, 114, 123, 137
],
[12, 2, 4], [3, 27, 27], [0.5, 0.9, 0.2, 0.05], 'local' ] ]
cf.fields_prediction = [ [cf.fields[0][0], 1.] ]
Great! Can you open a PR for the fix. Thanks!
Super! Thanks a lot! Can you check the evaluation step as well (e.g. doing global forecasting for just one single date) so in case we include also those fixes in the MR?
Yes, I can do that too. In that case the PR should stay open. Thanks for the support.
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
/p/scratch/hclimrep/polz1/data/era5_1deg/months/era5_y2014_2016_res025_chunk8.zarr/
)Expected behavior Training proceeds as usual.
Screenshots
Hardware and environment:
Additional context I created a new dataset containing model levels 23, 29, 41, 53, 60, 96, 105, 114, 123, 137 using all of them to train a singleformer. A maximum number of levels is harcoded in
trainer.py
and can be increased to solve this issue. It should be discussed if a flexible version, e.g. usinglen(cf.fields[0][1])
, is what we want. If at some point in the future one would desire different level numbers for different fields this would potentially raise another error (only looking at the first field). In principle the maximum number of levels per field would work.from trainer.py: