Made two changes to skip the setting of dtype to float during during launching of methods when the dtype is already np.float32.
Refactor how build_cluster was retrieving and calculating the cluster means to make use of numpy's means method which is a lot faster than applying it through pandas. Now running on 200k cells actually finishes in about 5 hours.
There's still a bit where I think can be sped up but I don't understand the code enough to work out how the for-loop is retrieving and updating the interaction database and the base_result object:
Particularly in mean_analysis and percent_analysis (both functions happen during the Running Real Analysis step; mean_analysis happens with every iteration during shuffled_analysis in Running Statistical Analysis step.
Both functions contain this for-loop:
for interaction_index, interaction in interactions.iterrows():
for cluster_interaction in cluster_interactions:
...
# ending in something like this
result.at[index, column] = value
return result
the same goes for build_percent_result, which has the same starting statement.
Also cython requirement is still there and currently doesn't work unless the version is increased to >=0.29.21 for python 3.8
Made two changes to skip the setting of dtype to float during during launching of methods when the dtype is already
np.float32
.Refactor how
build_cluster
was retrieving and calculating the cluster means to make use of numpy's means method which is a lot faster than applying it through pandas. Now running on 200k cells actually finishes in about 5 hours.There's still a bit where I think can be sped up but I don't understand the code enough to work out how the for-loop is retrieving and updating the interaction database and the base_result object:
Particularly in
mean_analysis
andpercent_analysis
(both functions happen during theRunning Real Analysis
step;mean_analysis
happens with every iteration duringshuffled_analysis
inRunning Statistical Analysis
step.Both functions contain this for-loop:
the same goes for
build_percent_result
, which has the same starting statement.Also cython requirement is still there and currently doesn't work unless the version is increased to >=0.29.21 for python 3.8