EcoExtreML / Emulator

Apache License 2.0
0 stars 1 forks source link

parallel computing experiment #3

Open QianqianHan96 opened 1 year ago

QianqianHan96 commented 1 year ago

I did three different runs below, on different number of timesteps, for 100 timesteps and 1 month, it is successful, but for 1 year, still having the error.

1) 1 spatial unit (5 degree * 5 degree), 7 variables, 100 timestep, on one core. The parameter for parallel execution is the Number of years = 3. image

2) 1 compute block (1 spatial unit, 7 variables, 1 month) on 1core. The parameter for parallel execution is the Number of years = 3.

image

3) 1 compute block (1 spatial unit, 7 variables, 1 year) on 1core. The parameter for parallel execution is the Number of years = 3.

However I did not manage to get the result, it throw the error. Do you know the possible reason? The log file is at /projects/0/einf2480/global_data_Qianqian/slurm-2933096.out

image image

QianqianHan96 commented 1 year ago

I tried to run November and December, both failed with this recursion error. The only difference between Jan and Nov, Dec is Jan result has values, but Nov and Dec are all nan because two input variables are all nan except in Jan.

QianqianHan96 commented 1 year ago

I also tried Feb, also failed because two inputs of Feb are all nan values, so the predicted result are all nan values too. After I replace the two inputs with values, it succeed in Feb. The error happens when I call result_LE.values after the prediction loop finish. Later I tried to figure this error out in 2read10kminput-halfhourly-0608py.ipynb, see the README.md in 1 computationBlockTest. image

QianqianHan96 commented 1 year ago

So the 32 hours for 1 computation block is based on only Jan has data on Rin and Rli, other months are all nan values in Rin and Rli, line 151 only run for range(745) which is for Jan, but 1 year should be 17520. After I make it 17520, it takes longer time to run (every timestep for predicting 20 seconds now, 4 seconds with range(745)). So now I am trying to run 6 months to be sure not exceed the timelimit (Job 2957047). image