So far as I can tell optimization of the allocation of threads during step 6 could be tuned a bit. Overall, it seems that the full suite of cores allocated are only really being used during the alignment phase, with a subset used during clustering. The code is doing something I don't understand rn (clustmap_across.py:Line ~200), looks like the threading is being squashed by that last line:
# how to load-balance cluster2 jobs
# maxthreads = 8 cuz vsearch isn't v efficient above that.
## e.g., 24 cpus; do 2 12-threaded jobs
## e.g., 2 nodes; 40 cpus; do 2 20-threaded jobs or 4 10-threaded jobs
## e.g., 4 nodes; 80 cpus; do 8 10-threaded jobs
if nnodes == 1:
thr = np.floor(self.data.ncpus / njobs).astype(int)
eids = max(1, thr)
eids = max(eids, len(list(self.hostd.values())[0]))
else:
eids = []
for node in self.hostd:
sids = self.hostd[node]
nids = len(sids)
thr = np.floor(nids / (njobs / nnodes)).astype(int)
thr = max(1, thr)
thr = min(thr, nids)
eids.extend(self.hostd[node][::thr])
# set nthreads based on ipcluster dict (default is 2)
#if "threads" in self.data.ipcluster.keys():
# self.nthreads = int(self.data.ipcluster["threads"])
self.nthreads = 2
if self.data.ncpus > 4:
self.nthreads = 4
eids = self.ipyclient.ids[::self.nthreads]
Very light testing shows:
Clustering Tier 1 uses 4 threads per chunk
Clustering across uses more, but hard to tell how much more. I allocated 40 cores and it only ever appeard to be using 7-8 at most.
So far as I can tell optimization of the allocation of threads during step 6 could be tuned a bit. Overall, it seems that the full suite of cores allocated are only really being used during the alignment phase, with a subset used during clustering. The code is doing something I don't understand rn (clustmap_across.py:Line ~200), looks like the threading is being squashed by that last line:
Very light testing shows: