FertigLab / CoGAPS

Bayesian MCMC matrix factorization algorithm
https://www.bioconductor.org/packages/release/bioc/html/CoGAPS.html
BSD 3-Clause "New" or "Revised" License
61 stars 17 forks source link

Confusion Regarding patternMarkers Results #115

Closed mumdark closed 1 month ago

mumdark commented 1 month ago

Hello,

I used the following code to find genes associated with each pattern, which returns a dictionary containing lists of marker genes, their ranking, and their score for each pattern.

pm <- patternMarkers(cogapsresult, threshold="cut")
genes_pm_rank <- pm$PatternRanks
genes_pm_rank[1:8, 1:3]
#        Pattern_1 Pattern_2 Pattern_3
#SAMD11      12921     11496       724
#NOC2L       10980     11147     10890
#KLHL17       1128     10107     11802
#PLEKHN1        86     13506     10504
#HES4         8083      1572     13497
#ISG15        7256     14617      2799
#AGRN         6907      3855     14205
#RNF223          1     13371     10514 

genes_pm_score <- pm$PatternScores
genes_pm_score[1:8, 1:3]
#           Pattern_1 Pattern_2 Pattern_3
#SAMD11  1.412506e+00 1.3123543 0.1508771
#NOC2L   1.299737e+00 1.2897692 1.4153460
#KLHL17  2.835759e-01 1.2301553 1.4407724
#PLEKHN1 2.484960e-07 1.4142136 1.4142134
#HES4    1.112514e+00 0.5124691 1.5042023
#ISG15   1.058937e+00 1.5104755 0.9793233
#AGRN    1.037002e+00 0.9594297 1.5471631
#RNF223  0.000000e+00 1.4142136 1.4142136

The issue I'm facing is that for Pattern1, the gene RNF223 is ranked first in PatternRanks, but it has the lowest score in PatternScores, as shown below:

image image

Isn't it expected that genes with higher rankings should have higher scores?

Looking forward to your response.

dimalvovs commented 1 month ago

Hey @mumdark, since score corresponds to the Euclidean distance between the observed pattern and template, the lowest distance is ranked highest. The ideal score would be 0, corresponding to the observed pattern matching the template fully.

dimalvovs commented 1 month ago

closing as answered.