Closed tatarsky closed 8 years ago
Note these units will be added tomorrow unless I hear they are missing software.
@akahles in particular I want you to be aware of this as I believe many of your waiting jobs would run on these machines if I add them tomorrow morning. (assuming of course they are still waiting)
Currently traveling with irregular access to internet. Had a quick look at the node and seems to be fine. My jobs use mostly python that uses my own anaconda. As long as my home is mounted, it should be fine :)
Gotcha. Just didn't want to surprise you. Safe travels.
Thanks for the heads up anyways!
I'll try some compute & memory heavy R jobs.
Sounds good. gpu-2-8
I was hoping had some Titans it it for you but its GTX-680s. We managed to get that unit repaired today and I'm validating it with a new health check.
No worries, with gpu-2-14 and gpu-2-5 in service I should be ok.
OK.
Most likely I will add these nodes at around 2:00PM today as I will have a nice clear section of my day to listen for any issues. If you are still manually performing tests I'll hold off.
My tests are still running, but I'll have to abort anyway, as there's an error I've detected. I'll kill my jobs and rewrite them for submission to the batch queue.
Performing an initial test of just adding cc01
to batch.
Some jobs appear running there. Will wait for a bit to monitor for any issues.
cc02
added. cc03/cc04/cc05 in a moment.
@akahles just a heads up some of your jobs are running on cc02. Look ok to me but thats just from a process table view.
cc03/cc04/cc05
now in batch as well. Watching for a bit to make sure I believe they are not eating jobs but then will close this.
Units appear to be processing jobs. Closing for now.
Per discussion with @juanperin a set of five HP DL160 Gen9 units that were tasked from another location over last December to assist another researcher who recently gained his own nodes are being prepped to be added to
batch
to improve non-GPU requiring processor counts.They are compute only. (No GPUS). 48 thread slots and 256GB of ram.
I added them originally fairly quickly using a post-ROCKS puppet method that worked for the specific researcher in question but he was the only "client" as it were. They worked for his needs and my basic tests show items in place.
But in order to validate them a bit more for a wider audience I would ask the following of those that would find this interesting to make sure I do not introduce nodes with missing items to the
batch
queue which is the default for ALL users.cc01
This test will be for a few days.
It would be nice to relieve some of the pressure on GPU containing nodes from compute only jobs.
Thank you.