Closed bentsherman closed 5 years ago
The data I have so far:
version clusmethod impl threshold
3.2.2 none cpu 0.85
3.2.2 none gpu 0.85
3.2.2 gmm cpu -
3.2.2 gmm gpu 0.95
3.3.0 none cpu 0.85
3.3.0 none gpu 0.85
3.3.0 gmm cpu -
3.3.0 gmm gpu 0.95
The last two data points take longer to acquire but it seems like the higher threshold happens when GMMs are enabled. The fact that it happens with 3.2.2 as well tells me that RMT may be to blame. We changed RMT to reduce the multi-modal similarity matrix to use only one mode per gene pair, and we never verified it with Yeast, so I'm going to look into RMT again.
Here's an RMT log for the Yeast similarity matrix without GMMs:
thresh prune unique chi
0.990 1 1 -1
0.989 1 1 -1
0.988 1 1 -1
0.987 1 1 -1
0.986 1 1 -1
0.985 1 1 -1
0.984 1 1 -1
0.983 1 1 -1
0.982 1 1 -1
0.981 1 1 -1
0.980 1 1 -1
0.979 1 1 -1
0.978 1 1 -1
0.977 1 1 -1
0.976 1 1 -1
0.975 1 1 -1
0.974 1 1 -1
0.973 1 1 -1
0.972 2 1 -1
0.971 2 1 -1
0.970 2 1 -1
0.969 2 1 -1
0.968 2 1 -1
0.967 2 1 -1
0.966 2 1 -1
0.965 3 1 -1
0.964 4 1 -1
0.963 4 1 -1
0.962 5 3 -1
0.961 5 3 -1
0.960 5 3 -1
0.959 5 3 -1
0.958 5 3 -1
0.957 8 3 -1
0.956 9 3 -1
0.955 9 3 -1
0.954 9 3 -1
0.953 11 5 -1
0.952 12 5 -1
0.951 13 5 -1
0.950 13 5 -1
0.949 15 6 -1
0.948 17 6 -1
0.947 17 6 -1
0.946 21 10 -1
0.945 21 10 -1
0.944 22 12 -1
0.943 25 12 -1
0.942 26 12 -1
0.941 26 12 -1
0.940 26 12 -1
0.939 29 14 -1
0.938 30 16 -1
0.937 33 16 -1
0.936 33 16 -1
0.935 33 16 -1
0.934 36 18 -1
0.933 39 20 -1
0.932 42 20 -1
0.931 44 20 -1
0.930 48 21 -1
0.929 49 23 -1
0.928 52 24 -1
0.927 55 24 -1
0.926 59 29 -1
0.925 62 30 -1
0.924 65 33 -1
0.923 69 34 -1
0.922 77 36 -1
0.921 80 38 -1
0.920 87 40 -1
0.919 91 43 -1
0.918 97 47 -1
0.917 99 47 -1
0.916 103 51 49.1739
0.915 106 52 61.3273
0.914 108 54 58.4966
0.913 113 60 53.6267
0.912 123 71 59.6175
0.911 127 78 58.2033
0.910 133 80 60.0903
0.909 136 84 61.5853
0.908 141 85 62.9752
0.907 150 94 53.224
0.906 152 96 66.8565
0.905 156 101 60.2791
0.904 161 104 68.39
0.903 168 108 58.4994
0.902 176 113 66.2343
0.901 184 121 69.185
0.900 188 126 63.696
0.899 193 132 65.4627
0.898 203 140 61.4581
0.897 208 143 73.7135
0.896 215 149 62.6007
0.895 221 157 62.0958
0.894 230 160 55.0021
0.893 239 172 65.0139
0.892 249 180 66.1747
0.891 256 183 68.6507
0.890 266 195 68.7294
0.889 277 201 73.3123
0.888 285 206 55.524
0.887 292 210 65.075
0.886 294 210 73.7646
0.885 301 213 71.7432
0.884 311 223 74.2462
0.883 322 231 66.0499
0.882 329 235 65.3956
0.881 333 238 67.238
0.880 339 242 71.1167
0.879 345 247 73.6388
0.878 352 255 76.0843
0.877 357 258 71.3061
0.876 366 268 73.1551
0.875 375 270 69.2696
0.874 387 286 75.245
0.873 398 292 74.1203
0.872 408 302 67.8335
0.871 422 311 75.9076
0.870 433 322 79.4866
0.869 444 327 86.4472
0.868 458 340 81.5034
0.867 469 347 79.1588
0.866 480 355 75.7115
0.865 494 372 82.2447
0.864 503 380 80.6523
0.863 510 387 75.792
0.862 523 396 91.2657
0.861 535 407 77.365
0.860 548 417 81.8684
0.859 567 429 107.292
0.858 579 438 117.022
0.857 594 448 95.9011
0.856 606 464 86.8465
0.855 623 480 122.236
0.854 640 491 106.851
0.853 648 504 134.658
0.852 659 517 131.346
0.851 670 526 116.934
0.850 691 547 132.743
0.849 709 572 139.193
0.848 723 581 133.358
0.847 740 598 146.253
0.846 756 612 151.105
0.845 769 617 158.802
0.844 787 631 169.441
0.843 799 649 179.234
0.842 820 663 200.29
0.856002
And with GMMs:
0.990 4613 61 390.136
0.989 4613 77 276.703
0.988 4615 99 263.587
0.987 4704 131 324.417
0.986 4725 148 368.266
0.985 4727 181 416.131
0.984 4728 211 384.704
0.983 4831 258 197.236
0.982 4836 288 108.219
0.981 4911 316 164.173
0.980 4917 355 145.899
0.979 4957 384 171.118
0.978 5021 440 218.089
0.977 5064 474 209.316
0.976 5110 530 223.776
0.975 5116 569 225.077
0.974 5158 603 221.971
0.973 5161 658 283.673
0.972 5218 703 248.193
0.971 5221 749 186.44
0.970 5223 798 165.969
0.969 5228 848 169.751
0.968 5232 889 145.797
0.967 5237 928 124.81
0.966 5244 969 131.912
0.965 5252 1038 135.666
0.964 5258 1080 95.1461
0.963 5262 1140 80.5096
0.962 5265 1188 74.3832
0.961 5267 1247 75.979
0.960 5271 1308 83.6358
0.959 5275 1366 68.6064
0.958 5284 1422 81.7791
0.957 5285 1443 79.7833
0.956 5293 1511 108.501
0.955 5305 1562 134.306
0.954 5315 1629 176.324
0.953 5324 1686 199.499
0.952 5332 1739 249.203
0.957
The biggest difference I see is that the matrix with GMMs immediately has a much larger prune matrix even at th=0.99. Some things to look into:
The pairwise reduction methods all yielded the same result more or less. Going to look at pairwise scatter plots...
Oh look at that...
KINC's spearman code is identifying clusters that are perfectly flat (zero correlation) as being perfectly correlated. Kind of reminds me of Jordan's simulated data. Looks like we should have dealt with that edge case after all!
As a quick fix, I re-ran similiarity with --maxcorr 0.99
, which essentially removed perfect correlations like the one shown above. Here is the resulting RMT log:
0.990 0 0 -1
0.989 16 1 -1
0.988 36 1 -1
0.987 149 3 -1
0.986 189 5 -1
0.985 218 9 -1
0.984 251 15 -1
0.983 389 31 -1
0.982 426 39 -1
0.981 535 47 -1
0.980 579 61 329.175
0.979 648 71 251.155
0.978 763 90 319.615
0.977 843 108 288.065
0.976 943 140 367.993
0.975 989 168 364.373
0.974 1081 189 380.286
0.973 1128 217 384.03
0.972 1234 257 397.712
0.971 1286 283 427.023
0.970 1353 328 325.621
0.969 1396 366 310.92
0.968 1451 409 273.73
0.967 1507 449 236.928
0.966 1551 472 234.246
0.965 1614 532 189.673
0.964 1672 554 182.11
0.963 1736 611 154.486
0.962 1783 647 129.565
0.961 1831 685 113.996
0.960 1897 730 89.2403
0.959 1964 779 71.2079
0.958 2026 814 62.573
0.957 2078 865 70.4274
0.956 2147 950 63.3342
0.955 2220 1009 78.4094
0.954 2272 1075 142.431
0.953 2329 1125 170.49
0.952 2399 1181 186.599
0.951 2455 1246 216.113
0.955
So RMT still settles on 0.95, and the extracted network is still global. However, I should note that even if I extract the network at 0.85, I still get a global network. So I think that 0.85 is not necessarily the correct threshold for Yeast when using GMMs. If 0.85 worked with KINCv1, then my guess would be that KINCv3 is producing a different similarity matrix compared to KINCv1.
@bentsherman what do you mean by "global" network? Both the GMM and the traditional should be a global network (at 0.85 or 0.95 cutoff). The network is only non-global if you then filter the edges down to condition-specific, which RMT doesn't do. I just want to make sure I'm fully understanding the problem.
My apologies, I should have simply said that the network is not scale-free, that's what I meant.
If the code that populates the matricies that feed into RMT do not have bugs then I do not think the problem is with the implementation. But it could be a side-effect of using GMMs, which I can give some thoughts to, but it would require a meeting as it would be difficult to explain here.
Given our conversation today, I think we can close this out, as long as you, @bentsherman, are confident that the data being provided to the RMT code from the cluster files is correct.
Apparently the Yeast network should be thresholded at ~0.85 to be scale-free, but right now KINC is thresholding Yeast at ~0.95, which produces a global network. Running KINC without GMMs still produces a threshold of 0.85, but enabling GMMs in some cases yields a 0.95 threshold. It is not yet clear if there is a difference between CPU / CUDA / OpenCL, or between 3.2.2 and 3.3.0, so we need to run all of these cases and identify which ones are correct and incorrect.