Teichlab / spectrum-of-differentiation-supplements

Mirror of analysis files for "Single-Cell RNA-Sequencing Reveals a Continuous Spectrum of Differentiation in Hematopoietic Cells"
7 stars 5 forks source link

Different dendrograms in notebook 1 #1

Open olgabot opened 8 years ago

olgabot commented 8 years ago

Hello, Thank you for making your code publicly available! I'm using it in a course because of its awesome bioinformatics. I'm having trouble recapitulating this dendrogram, whose y-axis ranges from 0 to 1.2:

image

When I run the code in my environment, I get this dedrogram, whose y-axis scales from 0 to 20 (!!):

image

Do you know why that would be? I don't know if scipy updated its linkage methods in between this notebook and me using it now. Did you use fastcluster? It may be that its linkage methods produce the same visual results but the dendrograms are different. Thanks, Olga

olgabot commented 8 years ago

Just tried it with fastcluster and got the same (wrong) result:

image

vals commented 8 years ago

Hm that is odd. In your dendrogram it appears the linkage is putting super-clusters further apart systematically. The distance scale is not super important I think, but it's annoying that the topology seems slightly different.

If you put a threshold at ~6., how does the clusters you get compare with the ones annotated in our supplementary table? (So that you get 5 clusters). You could also look at the output from cell [18] in our Notebook 1.

Did you use the ICA reported in our supplementary table, or did you re-calculate it? (By running Notebook 1.) It could also be due to updates in the FastICA function of scikit-learn.

olgabot commented 8 years ago

With a cutoff at 6, I do get 5 clusters: image

I re-calculated ICA so that may be causing it. I don't see the ICA components table here - is it somewhere else?

image

vals commented 8 years ago

Right, but do those 5 clusters correspond to the 5 clusters we considered? (i.e. 1, 2, 3, 4, X (1a and 1b merged together)). Or in other words, the different colors at Out[18]: in https://github.com/Teichlab/spectrum-of-differentiation-supplements/blob/master/1.%20Find%20state%20change%20clusters.ipynb

The file "Data S1. ..." contains two CSV files with all the analysis results, from which every figure should be reproducible. The ICA components columns are called difference_component, within_small_component, within_large_component, and outlier_component. (In the CSV file original_experiment_sample_info.csv).

olgabot commented 8 years ago

Using the conveniently printed sample to cluster identification, I found majority correspondence between these newly calculated clusters and the original ones:

image

Here's the work so far.

With both the re-calculated FastICA and the original, I get the same clusters, so I think it's an update in the ward linkage implementation rather than ICA.

image

image

olgabot commented 8 years ago

oh wait.. it IS a fastcluster vs scipy ward issue. I had installed fastcluster in my python3 environment and not python2, and just now switched to python3.

Original dendrogram: image

With clusters (this looks correct to me by eye)

image

olgabot commented 8 years ago

Yes, and now the clusters PERFECTLY overlap with your original ones:

image.

Do you think you can export an environment.yml file with all the packages you used? (conda env export > environment.yml). This will also let you use binder to add an interactive "reproduce my figures" button for people to run your examples.

vals commented 8 years ago

Unfortunately I've changed computational environment at least twice since we wrote the paper. But I'll keep the environment.yml thing in mind for next one.

Now I wonder why the linkage functions act differently for the two implementations. Do you know if this is particular for Ward linkage or does the same phenomenon happen with other linkages? I know some implementations of Ward require squared distances while others don't for example. It would be annoying if fastcluster does this differently from scipy, in particular if clustemap is not aware of the difference.