Open augurworks1 opened 9 years ago
That's really interesting. I'll take a look at what happens to resources when the run fails.
So it's possible that those jobs are actually still in progress rather than failed. All resources should be immediately released if a job completes, whether that completion is caused by it succeeding or failing. Would it help to track the status of every job?
Good point, I'm suggesting they failed because they don't return a result within say 24 hours, but they could just be running infinitely.
If we track each job, is there a way to kill it? Regardless, it would be good info to have.
On Sun, Feb 22, 2015 at 12:40 PM, Stephen Freiberg <notifications@github.com
wrote:
So it's possible that those jobs are actually still in progress rather than failed. All resources should be immediately released if a job completes, whether that completion is caused by it succeeding or failing. Would it help to track the status of every job?
— Reply to this email directly or view it on GitHub https://github.com/augurworks1/UI/issues/194#issuecomment-75447081.
Ah I see. Yes, I can add a way to interrupt jobs on demand. Should be able to get that behavior in by our call
cool, thanks
On Sun, Feb 22, 2015 at 3:32 PM, Stephen Freiberg notifications@github.com wrote:
Ah I see. Yes, I can add a way to interrupt jobs on demand. Should be able to get that behavior in by our call
— Reply to this email directly or view it on GitHub https://github.com/augurworks1/UI/issues/194#issuecomment-75456079.
This is in https://github.com/augurworks1/Core/pull/41. To cancel a job, connect to the server on its port (telnet locahost
It looks like a failed NN run does not release the thread (or something) and then overall performance drops off. If the jar is restarted, then performance is superb. In fact, a 1 month, 8 input NN run takes about 15 minutes after jar restart!!