Closed Fan-iX closed 1 year ago
Thanks for the reply! I will repeat the experiment with the data and check the output soon.
Dear Fan-iX, the problem happens when re-ordering the dendrogram. The current version is based on an intrusive design that ordering the merging steps with increasing distances between leaves. I am adding a step to check the dendrogram before output.
So the problem is about the re-ordering function and not directed to the iNMF data.
Hi @Fan-iX, here is an update of the packages to fix the ordering bug I mentioned above, and it past the test of your iNMF reduction matrix in our environment. Could you try it when you are avilable?
Here is the link of the repaired package
https://drive.google.com/file/d/1ViWTnF0dKNs5L5KfD0QaSINflWmO5JrG/view?usp=sharing.
Also the bug will be fixed in the next update of HGC package, thanks for providing the valuable instance.
This version works for other iNMF matrixes in my environment as well. Thanks all!
Hello, @zouzh14 , There is a small problem with your fixed version of HGC: the height
component of the hclust is not monotonic.
# the same `hc` object created from inmf_reduction.csv, as above
head(rank(hc$height),n=50)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 47 48 49 50 51 52 # missing 27, 46
The "missing" ranks are at
data.frame(height=hc$height,rank=rank(hc$height))[14385:14395,]
height rank
14385 3.126114e-03 42015
14386 3.162925e-03 42051
14387 3.186890e-03 42075
14388 1.615852e-05 27
14389 2.233029e-05 46
14390 2.411827e-05 59
14391 2.559428e-05 67
14392 2.569810e-05 68
14393 2.668064e-05 74
14394 2.930629e-05 85
14395 3.023655e-05 91
, showing its non-monotonicity.
Usually, a "native" hclust object (created by the clust
function) should have a monotonically increasing height
. Now, I can use
hc <- as.hclust(as.dendrogram(hc))
to make it monotonic, but it takes a long time and a lot of unnecessary computation.
I expect HGC could produce monotonic hclust object natively as before.
Thank you!
I realized that the tree is merged according to the branch size in the fixed version. This provides some valuable information for my downstream analysis, so it is acceptable for me.
But it would be nice if you could add an argument to specify the output order for the function, since both ordering makes sense:
Thx for sharing the experience! Sure the tree structure itself does not change that we expect the clustering results in each layer keep same. And the prime change is the output ordering of leaves and middle nodes. I will find a way to deal with it in next version.
I meet with a problem when applying HGC on iNMF reduction matrix produced from liger:
inmf_reduction
(csv) is a 44707×30 matrixI create a hierarchical clustering tree from the reduction matrix as following:
Then there seems to be something wrong with the clustering tree:
The
merge
component of a hclust object shows at each step which two items are combined, so normally the value in each row (V1
andV2
) should be smaller than the row number (n
). Here we get somen < V2
in the tree, which means that in step13500
, we sould combine leaf node28367
with the node produced in (a future) step13504
. This leads to a "non-monotonic" tree, and causes problem when plotting:Is this an expected behavior? Can I apply HGC to an iNMF reduction matrix?