Jake & Paul - I've added a jupyter notebook showing how to use the multithreading package in python.
I recommend the outline of the functions to look something like:
1) Splitting function (parallelized) --> 2) Louvain clustering (saves cluster label file .npy to disk) --> 3) Analysis of reproducibility and predictive value of bagged vs standard clustering (pulls everything to disk.
In other words, the parallelization happens and all the data are created, and then when that's finished a function is run to collect all the outputs and compute the essential reproduicbility metrics and phenotypic comparisons.
As you can see, you need to define what you want to parallelize and then define the parallelization function. You might need to do something like create a hash, or use a random number generator to save unique .npy cluster label files or something like that.
Define the function to parallelize:
`#Multithreading with Pool
import time
import multiprocessing
def basic_func(x):
if x == 0:
return 'zero'
elif x%2 == 0:
return 'even'
else:
return 'odd'`
Define multiprocessing function:
def multiprocessing_func(x): y = x*x time.sleep(2) print('{} squared results in a/an {} number'.format(x, basic_func(y))) return(y)
Run multithreaded function:
`if name == 'main':
starttime = time.time()
pool = multiprocessing.Pool()
pool.map(multiprocessing_func, range(0,10))
pool.close()
print('That took {} seconds'.format(time.time() - starttime))`
Jake & Paul - I've added a jupyter notebook showing how to use the multithreading package in python.
I recommend the outline of the functions to look something like:
1) Splitting function (parallelized) --> 2) Louvain clustering (saves cluster label file .npy to disk) --> 3) Analysis of reproducibility and predictive value of bagged vs standard clustering (pulls everything to disk.
In other words, the parallelization happens and all the data are created, and then when that's finished a function is run to collect all the outputs and compute the essential reproduicbility metrics and phenotypic comparisons.
As you can see, you need to define what you want to parallelize and then define the parallelization function. You might need to do something like create a hash, or use a random number generator to save unique .npy cluster label files or something like that.
Define the function to parallelize:
`#Multithreading with Pool
import time import multiprocessing
def basic_func(x): if x == 0: return 'zero' elif x%2 == 0: return 'even' else: return 'odd'`
Define multiprocessing function:
def multiprocessing_func(x): y = x*x time.sleep(2) print('{} squared results in a/an {} number'.format(x, basic_func(y))) return(y)
Run multithreaded function:
`if name == 'main':