bnaras / PMA

4 stars 3 forks source link

Correlation discrepancy #1

Open joreynajr opened 3 years ago

joreynajr commented 3 years ago

Hi,

I am running the example breast data and the correlation between gene expression and chr1 canonical variables is different if I do it myself versus the correlation values calculated by PMA. What would be the reason for this?

I am using the following code:

library(PMA)
data(breastdata)
attach(breastdata)
# Example involving the breast cancer data: gene expression + CGH
# We run CCA using all gene exp. data, but CGH data on chrom 1 only.

# Loading the data 
rna_input = t(rna)

chrom_num = 1
dna_input = t(dna)
dna_input = dna_input[,chrom == chrom_num]

# Running a permutation test to find the best parameters 
perm.out <- CCA.permute(x=rna_input,
                        z=dna_input,
                        typex="standard",
                        typez="ordered",
                        nperms=5,
                        penaltyxs=seq(.02, .7, len=10))
print(perm.out)
par(mar=c(4,4,4,4))
plot(perm.out)

# Running CCA with the best penalties from the previous
# section
out <- CCA(x=rna_input,
           z=dna_input,
           typex="standard",
           typez="ordered",
           K=3,
           penaltyx=perm.out$bestpenaltyx,
           v=perm.out$v.init,
           penaltyz=perm.out$bestpenaltyz,
           xnames=substr(genedesc,1,20), 
           znames=paste("Pos", sep="", nuc[chrom==1]))
print(out)

rna_cv = rna_input %*% out$u
dna_cv = dna_input %*% out$v

# DISCREPANCIES between PMA and manual correlation calculations
print(out$cors)
print(cor(rna_cv[, 1], dna_cv[,1]))
print(cor(rna_cv[, 2], dna_cv[,2]))
print(cor(rna_cv[, 3], dna_cv[,3]))

Thanks!

Joaquin

guido233 commented 9 months ago

same problem

guido233 commented 9 months ago

I find out the reason why this happened. CCA function has the parameter 'standardize = TRUE' as default. And I guess this standardize might change the input in a nonlinear way. So x_prime is not just x %*% u, y_prime is not just y %*% v. If you set the standardize = FALSE. You will get the same correlation.