clessig / atmorep

AtmoRep model code
MIT License
35 stars 9 forks source link

performance degradation when running with local normalizations #31

Open iluise opened 3 weeks ago

iluise commented 3 weeks ago

Samples per second for temperature (local normalisation, embedding dimension 512) seem unstable:

0: epoch: 19 [219/512 (43%)]    Loss: 0.26280 : 0.03152 :: 0.13843 (224.48 s/sec)
0: epoch: 19 [220/512 (43%)]    Loss: 0.28077 : 0.03847 :: 0.14124 (231.14 s/sec)
0: epoch: 19 [221/512 (43%)]    Loss: 0.24217 : 0.02282 :: 0.12537 (61.81 s/sec)
0: epoch: 19 [222/512 (43%)]    Loss: 0.27576 : 0.03416 :: 0.14417 (234.55 s/sec)
0: epoch: 19 [223/512 (44%)]    Loss: 0.27557 : 0.03425 :: 0.14246 (12.98 s/sec)
0: epoch: 19 [224/512 (44%)]    Loss: 0.27298 : 0.03561 :: 0.13977 (164.26 s/sec)
0: epoch: 19 [225/512 (44%)]    Loss: 0.26297 : 0.03092 :: 0.13477 (227.73 s/sec)
0: epoch: 19 [226/512 (44%)]    Loss: 0.27038 : 0.03316 :: 0.13501 (14.48 s/sec)
0: epoch: 19 [227/512 (44%)]    Loss: 0.26069 : 0.02907 :: 0.13635 (45.37 s/sec)
0: epoch: 19 [228/512 (45%)]    Loss: 0.26004 : 0.03117 :: 0.13708 (176.89 s/sec)
clessig commented 3 weeks ago

That's on MareNostrum?

iluise commented 3 weeks ago

That's on Juelich actually. I can check on MN and post it here

clessig commented 3 weeks ago

If the machine in Juelich is under heavy load it might just be fluctuations in the file system latencies.