AugurWorks / UI

Augurworks UI code
0 stars 0 forks source link

Clean up failed NN runs #194

Open augurworks1 opened 9 years ago

augurworks1 commented 9 years ago

It looks like a failed NN run does not release the thread (or something) and then overall performance drops off. If the jar is restarted, then performance is superb. In fact, a 1 month, 8 input NN run takes about 15 minutes after jar restart!!

safreiberg commented 9 years ago

That's really interesting. I'll take a look at what happens to resources when the run fails.

safreiberg commented 9 years ago

So it's possible that those jobs are actually still in progress rather than failed. All resources should be immediately released if a job completes, whether that completion is caused by it succeeding or failing. Would it help to track the status of every job?

augurworks1 commented 9 years ago

Good point, I'm suggesting they failed because they don't return a result within say 24 hours, but they could just be running infinitely.

If we track each job, is there a way to kill it? Regardless, it would be good info to have.

On Sun, Feb 22, 2015 at 12:40 PM, Stephen Freiberg <notifications@github.com

wrote:

So it's possible that those jobs are actually still in progress rather than failed. All resources should be immediately released if a job completes, whether that completion is caused by it succeeding or failing. Would it help to track the status of every job?

— Reply to this email directly or view it on GitHub https://github.com/augurworks1/UI/issues/194#issuecomment-75447081.

safreiberg commented 9 years ago

Ah I see. Yes, I can add a way to interrupt jobs on demand. Should be able to get that behavior in by our call

augurworks1 commented 9 years ago

cool, thanks

On Sun, Feb 22, 2015 at 3:32 PM, Stephen Freiberg notifications@github.com wrote:

Ah I see. Yes, I can add a way to interrupt jobs on demand. Should be able to get that behavior in by our call

— Reply to this email directly or view it on GitHub https://github.com/augurworks1/UI/issues/194#issuecomment-75456079.

safreiberg commented 9 years ago

This is in https://github.com/augurworks1/Core/pull/41. To cancel a job, connect to the server on its port (telnet locahost ) and type 'status'. This will show the current jobs as name: status. You can cancel a job by typing 'cancel '