NWC-CUAHSI-Summer-Institute / dCFE

A differentiable Python version of NOAA-OWP / cfe using PyTorch, for research and development
MIT License
1 stars 0 forks source link

Training finishes at epoch #1 #9

Closed RY4GIT closed 1 year ago

RY4GIT commented 1 year ago

Summary

When I increase the training period in "ML_synthetic_run" mode, to more than around 10,000 timesteps, the training finishes with epoch #1. It looks like a memory issue?

Steps to reproduce the error

Error messages in the terminal

It seems the threshold of models failing is between 

start_time: '1991-10-01 00:00:00' end_time: '1992-10-30 23:00:00' #2019

 and 

start_time: '1991-10-01 00:00:00' end_time: '1992-12-30 23:00:00' #2019

## Error messages in the developer tool

workbench.desktop.main.js:sourcemap:710 Uncaught Error: Model is disposed! at Un.cb (workbench.desktop.main.js:sourcemap:710:262) at Un.findMatches (workbench.desktop.main.js:sourcemap:713:3497) at e.$NRb.G (workbench.desktop.main.js:sourcemap:1632:52908) at e.$NRb.F (workbench.desktop.main.js:sourcemap:1632:52115) at workbench.desktop.main.js:sourcemap:1632:52092 cb @ workbench.desktop.main.js:sourcemap:710 findMatches @ workbench.desktop.main.js:sourcemap:713 G @ workbench.desktop.main.js:sourcemap:1632 F @ workbench.desktop.main.js:sourcemap:1632 (anonymous) @ workbench.desktop.main.js:sourcemap:1632 workbench.desktop.main.js:sourcemap:762 CodeExpectedError: Server[pid=19332] disconnected unexpectedly at e.$NOb.P (workbench.desktop.main.js:sourcemap:906:38841) at workbench.desktop.main.js:sourcemap:906:38380 at async e.$NOb.next (workbench.desktop.main.js:sourcemap:906:31647) at async e.$OOb.next (workbench.desktop.main.js:sourcemap:1855:14174) at async N (workbench.desktop.main.js:sourcemap:1857:33918) at async handler (workbench.desktop.main.js:sourcemap:1857:38564) at async I.k (workbench.desktop.main.js:sourcemap:129:9410) at async I.run (workbench.desktop.main.js:sourcemap:129:9334) at async e.$rZ.onClick (workbench.desktop.main.js:sourcemap:758:19371)

RY4GIT commented 1 year ago
RY4GIT commented 1 year ago

adapt nn.Sequential for the way forward?

RY4GIT commented 1 year ago

Coded in f828afb77af59fcd9401019f3f82583855904a60 but haven't checked it works or not

RY4GIT commented 1 year ago

working in 889840731d15fc60c5bc8cd099763c7cd929cedc