ncores and jobs_per_node

@maxscheurer brought up in #405 that the default jobs_per_node = 2 was a little unexpected in that a single job submitted through qcengine would use half the node's resources (config.py: ncores = task_config.pop("ncores", int(ncores / jobs_per_node))). That's what the code logic says, but it didn't jive with my experience of cmdline qcng, which is that a single job w/o task_config fills the node.

jobs_per_node turned from 1 to 2 very early on in mid 2018 (https://github.com/dgasmith/QCEngine/commit/41e7da039102195a5f17e5526a485f5cdefc6bff). At this point, a default job should have filled half a node because logical cores were being collected, then if processor was Intel, it halved them to remove hyperthreading. Only reason I can think of to have the default=2 is because in practice for clusters the value was usually set outright and having 2 locally exposed errors that serial would have hidden. But I really don't know why the 2. @Lnaden, any institutional memory?
shortly after that (https://github.com/MolSSI/QCEngine/commit/0cdad20ca649241e1c33ad22475a4449ceea7788), the code switched from assuming Intel==hyperthreading to psutil.cpu_count(logical=False), so with jobs_per_node=2, the default is still to fill half a node.
shortly after that (https://github.com/MolSSI/QCEngine/commit/83d44096d850acd386dd466d377d0b5ee2cc0be5), the code switched to the modern logic of len(psutil.Process().cpu_affinity()) else psutil.cpu_count(logical=False) else psutil.cpu_count(logical=True). The trouble with that is that the values for my computer (20 physical cores plus hyperthreading) are 40, 20, 40, respectively, so ncores is the logical, not the physical core count.

The net effect is that with ncores a factor of 2 too large (40 from cpu_affinity instead of 20 from cpu_count(logical=False)) and the unexpected jobs_per_node=2 default a factor of 2 larger than the expected 1, the expected result happens of qcng filling all (20) physical cores.

So by my current understanding, jobs_per_node should be reset to 1, since filling the node is the most expected default behavior. But this needs to be accompanied by a modification in how ncores is computed, so that if cpu_affinity = cpu_count(logical=True), then probably the affinity hasn't been set and ncores should be cpu_count(logical=False). In this plan, for default settings on Intel processors, the net effect is unchanged, while the two parameters are finally doing as they claim.

This issue doesn't really affect QCFractal usage as it has explicit settings through executors and managers. I only really detected the issue b/c GAMESS gives up when you run too many threads, and 40 (as the change in #405 handed to test_local_options_scratch[gamess-model2-keywords2] is too many :-).

Thoughts? Have I analyzed this right, and is this the right plan? pinging @bennybp , too, in case there are qcf implications I don't see.

MolSSI / QCEngine

ncores and jobs_per_node #415