Step 6 - Fixing threading optimization

So far as I can tell optimization of the allocation of threads during step 6 could be tuned a bit. Overall, it seems that the full suite of cores allocated are only really being used during the alignment phase, with a subset used during clustering. The code is doing something I don't understand rn (clustmap_across.py:Line ~200), looks like the threading is being squashed by that last line:

        # how to load-balance cluster2 jobs
        # maxthreads = 8 cuz vsearch isn't v efficient above that.
        ## e.g., 24 cpus; do 2 12-threaded jobs
        ## e.g., 2 nodes; 40 cpus; do 2 20-threaded jobs or 4 10-threaded jobs
        ## e.g., 4 nodes; 80 cpus; do 8 10-threaded jobs
        if nnodes == 1:
            thr = np.floor(self.data.ncpus / njobs).astype(int)
            eids = max(1, thr)
            eids = max(eids, len(list(self.hostd.values())[0]))

        else:
            eids = []
            for node in self.hostd:
                sids = self.hostd[node]
                nids = len(sids)
                thr = np.floor(nids / (njobs / nnodes)).astype(int)
                thr = max(1, thr)
                thr = min(thr, nids)
                eids.extend(self.hostd[node][::thr])

        # set nthreads based on ipcluster dict (default is 2)        
        #if "threads" in self.data.ipcluster.keys():
        #    self.nthreads = int(self.data.ipcluster["threads"])
        self.nthreads = 2
        if self.data.ncpus > 4:
            self.nthreads = 4
        eids = self.ipyclient.ids[::self.nthreads]

Very light testing shows:

Clustering Tier 1 uses 4 threads per chunk
Clustering across uses more, but hard to tell how much more. I allocated 40 cores and it only ever appeard to be using 7-8 at most.
building clusters uses 1 core
aligning does appear to try to use all

dereneaton / ipyrad

Step 6 - Fixing threading optimization #380