Open olgabot opened 8 years ago
Just tried it with fastcluster
and got the same (wrong) result:
Hm that is odd. In your dendrogram it appears the linkage is putting super-clusters further apart systematically. The distance scale is not super important I think, but it's annoying that the topology seems slightly different.
If you put a threshold at ~6., how does the clusters you get compare with the ones annotated in our supplementary table? (So that you get 5 clusters). You could also look at the output from cell [18]
in our Notebook 1.
Did you use the ICA reported in our supplementary table, or did you re-calculate it? (By running Notebook 1.) It could also be due to updates in the FastICA function of scikit-learn.
With a cutoff at 6, I do get 5 clusters:
I re-calculated ICA so that may be causing it. I don't see the ICA components table here - is it somewhere else?
Right, but do those 5 clusters correspond to the 5 clusters we considered? (i.e. 1, 2, 3, 4, X (1a and 1b merged together)). Or in other words, the different colors at Out[18]:
in https://github.com/Teichlab/spectrum-of-differentiation-supplements/blob/master/1.%20Find%20state%20change%20clusters.ipynb
The file "Data S1. ..." contains two CSV files with all the analysis results, from which every figure should be reproducible. The ICA components columns are called difference_component
, within_small_component
, within_large_component
, and outlier_component
. (In the CSV file original_experiment_sample_info.csv
).
Using the conveniently printed sample to cluster identification, I found majority correspondence between these newly calculated clusters and the original ones:
Here's the work so far.
With both the re-calculated FastICA and the original, I get the same clusters, so I think it's an update in the ward linkage implementation rather than ICA.
oh wait.. it IS a fastcluster vs scipy ward issue. I had installed fastcluster
in my python3 environment and not python2, and just now switched to python3.
Original dendrogram:
With clusters (this looks correct to me by eye)
Yes, and now the clusters PERFECTLY overlap with your original ones:
.
Do you think you can export an environment.yml
file with all the packages you used? (conda env export > environment.yml
). This will also let you use binder to add an interactive "reproduce my figures" button for people to run your examples.
Unfortunately I've changed computational environment at least twice since we wrote the paper. But I'll keep the environment.yml
thing in mind for next one.
Now I wonder why the linkage functions act differently for the two implementations. Do you know if this is particular for Ward linkage or does the same phenomenon happen with other linkages? I know some implementations of Ward require squared distances while others don't for example. It would be annoying if fastcluster
does this differently from scipy
, in particular if clustemap
is not aware of the difference.
Hello, Thank you for making your code publicly available! I'm using it in a course because of its awesome bioinformatics. I'm having trouble recapitulating this dendrogram, whose y-axis ranges from 0 to 1.2:
When I run the code in my environment, I get this dedrogram, whose y-axis scales from 0 to 20 (!!):
Do you know why that would be? I don't know if scipy updated its linkage methods in between this notebook and me using it now. Did you use
fastcluster
? It may be that its linkage methods produce the same visual results but the dendrograms are different. Thanks, Olga