Open bnord opened 5 years ago
I have been testing a deep CNN with ReLU with the same architecture as Arch0Dense. This is a plot of the sigmas versus the number of neurons of the layers alp =5, # of seeds =10 and epochs = 500 . I am running for more seed options (100) but the colab keeps stopping the process, so I have to restart. Can we run in Wilson cluster?
yep, we can run on wilson. We'll need to get you an account. Can you email Amitoj G Singh amitoj@fnal.gov to ask for an account?
Why does colab stop the process? Is it after a certain amount of time?
There’s also a wall time on Wilson. We'll want to know what that is. We might need to save the network and re-start whether on Wilson or on colab.
I noticed that Colab stops before the 12 hour limit. I think it is not intended for long-running tasks.
From the Colab FAQs web page:
"Colaboratory is intended for interactive use. Long-running background computations, particularly on GPUs, may be stopped. Please do not use Colaboratory for cryptocurrency mining. Doing so is unsupported and may result in service unavailability. We encourage users who wish to run continuous or long-running computations through Colaboratory’s UI to use a local runtime."
I save the instances in .h5 files though, so it doesn't start from scratch.
Should we try wilson?
Yes, let's try. I'll ask for an account.
Test a deep convolutional network with ReLU and same number of neurons as in Arch0Dense from #2