cBio / cbio-cluster

MSKCC cBio cluster documentation
12 stars 2 forks source link

hyperthreading #386

Closed pavletich closed 8 years ago

pavletich commented 8 years ago

Is there a torque queue command to instruct mpi to use a non-hyperthreaded cpu core if the cluster has hyperthreading enabled ? The issue, by way of a hypothetical example, is that my mpi job needs 20 GB/process, each node has 24 physical cores that show as 48 hyperthreaded cores, and 500GB memory. So using all the memory of the node, I can run 24 processes/node. The problem is, this runs at 0.5x the speed of a comparable but hyperthreading disabled-cluster, which I suspect is due to my 24 processes utilizing only half of each physical core.

tatarsky commented 8 years ago

This is being investigated. Off the top of my head I do not know of one (short of disabling HT) but I submitted pretty much this statement to Adaptive.

jchodera commented 8 years ago

One option would be for @tatarsky to reboot one node in non-hyperthreaded mode for benchmarking to see if this is really a hyperthreading issue or a memory or I/O issue. While our initial benchmarking with other codes did indicated that hyperthreading was a net improvement in efficiency for many of our codes, if other processes are really slowing things down, simply requesting an entire node at a time could still deliver full speed without requiring major changes to the queuing system if it turns out that we can't easily disable hyperthreading via Torque.

tatarsky commented 8 years ago

The above has already been discussed via email which was where this started. We are disabling HT on an offline node.

The user in question is the only user on these nodes. (sbio queue)

tatarsky commented 8 years ago

@pavletich is it possible to test your code performance with only one node out of Torque? Or does this require several?

pavletich commented 8 years ago

I can run something interactively in cc27 and one other node for comparison ­ will probably take me a couple of days to get to it. Will let you know. Thanks. Nikola

On 3/14/16 12:45 PM, "tatarsky" notifications@github.com wrote:

@pavletich https://github.com/pavletich is it possible to test your code performance with only one node out of Torque? Or does this require several?

‹ Reply to this email directly or view it on GitHub https://github.com/cBio/cbio-cluster/issues/386#issuecomment-196405518 .

tatarsky commented 8 years ago

Sounds good. We have rebooted cc27 using an experimental method of doing perhaps the equivalent of a BIOS disable of HT. I am of the opinion it may not really be accurate. So I am curious what you find.

cc27 is currently not BIOS remotely accessible to me so I will wait for CBIO folks to be onsite for BIOS disable based tests. We will offline another node for that so we will have a comparison.

Please note cc27 is short a bank of ram due to some DOA sticks. When they come in it will be fully populated.

tatarsky commented 8 years ago

This conservation is I guess in email. Closing Git.