When I increase the training period in "ML_synthetic_run" mode, to more than around 10,000 timesteps, the training finishes with epoch #1. It looks like a memory issue?
Steps to reproduce the error
xxxx
xxxx
xxxx
Error messages in the terminal
Synthetic test for classic, 1990-1992, debug mode- finishes at Epoch 1 after tain_one_epoch()
[2023-07-24 13:37:28,013][agents.DifferentiableCFE][INFO] - Epoch #: 1/1000
Processing data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17544/17544 [00:19<00:00, 897.46it/s]
[2023-07-24 13:37:47,569][agents.DifferentiableCFE][INFO] - trained KGE: -0.1372
calculate loss
Loss backward starts
Loss backward ends
Start optimizer tensor([2.8429], grad_fn=<AddBackward0>) tensor([0.0003], grad_fn=<AddBackward0>)
End optimizer tensor([2.8429], grad_fn=<AddBackward0>) tensor([0.0003], grad_fn=<AddBackward0>)
It seems the threshold of models failing is between
workbench.desktop.main.js:sourcemap:710 Uncaught Error: Model is disposed!
at Un.cb (workbench.desktop.main.js:sourcemap:710:262)
at Un.findMatches (workbench.desktop.main.js:sourcemap:713:3497)
at e.$NRb.G (workbench.desktop.main.js:sourcemap:1632:52908)
at e.$NRb.F (workbench.desktop.main.js:sourcemap:1632:52115)
at workbench.desktop.main.js:sourcemap:1632:52092
cb @ workbench.desktop.main.js:sourcemap:710
findMatches @ workbench.desktop.main.js:sourcemap:713
G @ workbench.desktop.main.js:sourcemap:1632
F @ workbench.desktop.main.js:sourcemap:1632
(anonymous) @ workbench.desktop.main.js:sourcemap:1632
workbench.desktop.main.js:sourcemap:762 CodeExpectedError: Server[pid=19332] disconnected unexpectedly
at e.$NOb.P (workbench.desktop.main.js:sourcemap:906:38841)
at workbench.desktop.main.js:sourcemap:906:38380
at async e.$NOb.next (workbench.desktop.main.js:sourcemap:906:31647)
at async e.$OOb.next (workbench.desktop.main.js:sourcemap:1855:14174)
at async N (workbench.desktop.main.js:sourcemap:1857:33918)
at async handler (workbench.desktop.main.js:sourcemap:1857:38564)
at async I.k (workbench.desktop.main.js:sourcemap:129:9410)
at async I.run (workbench.desktop.main.js:sourcemap:129:9334)
at async e.$rZ.onClick (workbench.desktop.main.js:sourcemap:758:19371)
Tried to recreate the env on CUAHSI-SI JupyterHub or my lab computer, but the solving environment never finished. Might need to export the exact env from my laptop
Summary
When I increase the training period in "ML_synthetic_run" mode, to more than around 10,000 timesteps, the training finishes with epoch #1. It looks like a memory issue?
Steps to reproduce the error
Error messages in the terminal
start_time: '1991-10-01 00:00:00' end_time: '1992-10-30 23:00:00' #2019
start_time: '1991-10-01 00:00:00' end_time: '1992-12-30 23:00:00' #2019
workbench.desktop.main.js:sourcemap:710 Uncaught Error: Model is disposed! at Un.cb (workbench.desktop.main.js:sourcemap:710:262) at Un.findMatches (workbench.desktop.main.js:sourcemap:713:3497) at e.$NRb.G (workbench.desktop.main.js:sourcemap:1632:52908) at e.$NRb.F (workbench.desktop.main.js:sourcemap:1632:52115) at workbench.desktop.main.js:sourcemap:1632:52092 cb @ workbench.desktop.main.js:sourcemap:710 findMatches @ workbench.desktop.main.js:sourcemap:713 G @ workbench.desktop.main.js:sourcemap:1632 F @ workbench.desktop.main.js:sourcemap:1632 (anonymous) @ workbench.desktop.main.js:sourcemap:1632 workbench.desktop.main.js:sourcemap:762 CodeExpectedError: Server[pid=19332] disconnected unexpectedly at e.$NOb.P (workbench.desktop.main.js:sourcemap:906:38841) at workbench.desktop.main.js:sourcemap:906:38380 at async e.$NOb.next (workbench.desktop.main.js:sourcemap:906:31647) at async e.$OOb.next (workbench.desktop.main.js:sourcemap:1855:14174) at async N (workbench.desktop.main.js:sourcemap:1857:33918) at async handler (workbench.desktop.main.js:sourcemap:1857:38564) at async I.k (workbench.desktop.main.js:sourcemap:129:9410) at async I.run (workbench.desktop.main.js:sourcemap:129:9334) at async e.$rZ.onClick (workbench.desktop.main.js:sourcemap:758:19371)