clivehoggart / BridgePRS

Estimation of trans-ancestry PRS
MIT License
5 stars 1 forks source link

Error in predict subprogram #3

Open paola-arguello opened 1 month ago

paola-arguello commented 1 month ago

Hi there!

I have been working with simulated data using BridgePRS and I found that for exactly the same pipeline a phenotype returned an error due to differing number of dimensions. After some long hours looking at my data and the code, I found the source of the error in the get.pred.clump function (within src/functions.R). It seems that as written, the program does not account for cases where the intersection between beta.bar snps (from the large GWAS) and the ones available in the genotyping data for target data (bim file). I found that it was easily solved by restricting the beta.bar to only the overlaping snps. Below is the get.pred.clump function with the correction:

`get.pred.clump <- function( beta.bar, ptr.beta.use, clump.id, X.bed, bim, ids.use, strand.check ){

snps <- intersect( beta.bar$snp, bim$V2 )

**if(length(snps) < length(beta.bar$snp)){
  print("Warning!: intersection is lower than beta.bar!")
  print(paste0(length(snps)," > ", length(beta.bar$snp)))
  beta.bar <- beta.bar[beta.bar$snp %in% snps,]
}**

X <- X.bed[ ids.use, beta.bar$snp, drop=FALSE ]

ptr <- match( beta.bar$snp, colnames(X) )
X <- X[,ptr,drop=FALSE]
ptr.bed <- match( beta.bar$snp, colnames(X.bed) )

coded.allele <- bim$V5[ptr.bed]
other.allele <- bim$V6[ptr.bed]

swtch <- ifelse( coded.allele==beta.bar$effect.allele &
                 other.allele==beta.bar$ref.allele, 1, 0 )

swtch <- ifelse( coded.allele==beta.bar$ref.allele &
                 other.allele==beta.bar$effect.allele, -1, swtch )

ptr.miss <- which( swtch==0 )
if( strand.check & length(ptr.miss)>0 ){
    coded.allele1 <- alt.strand( coded.allele )
    other.allele1 <- alt.strand( other.allele )

    swtch[ptr.miss] <- ifelse( coded.allele1[ptr.miss] ==
                               beta.bar$effect.allele[ptr.miss] &
                               other.allele1[ptr.miss] ==
                               beta.bar$ref.allele[ptr.miss],
                              1, 0 )

    swtch[ptr.miss] <- ifelse( coded.allele1[ptr.miss]==
                               beta.bar$ref.allele[ptr.miss] &
                               other.allele1[ptr.miss]==
                               beta.bar$effect.allele[ptr.miss],
                              -1, swtch[ptr.miss] )
}
ptr.use <- which( swtch!=0 )
if( length(ptr.use)>0 ){
    X <- X[,ptr.use,drop=FALSE]
    beta.bar <- beta.bar[ptr.use,,drop=FALSE]

    ptr <- which(swtch == -1)
    if( length(ptr)>0 ){
        X[,ptr] <- 2 - X[,ptr]
    }

    m <- apply( X, 2, mean, na.rm=TRUE )
    for( ii in 1:ncol(X) ){
        X[,ii] <- ifelse( is.na(X[,ii]), m[ii], X[,ii] )
    }
    pred <- as.matrix(X) %*% as.matrix(beta.bar[,ptr.beta.use])
    ret <- pred
}else{
    ret <- matrix(data=0,ncol=length(ptr.beta.use),nrow=nrow(X))
}
rownames(ret) <- rownames(X)

return(ret)

}`

Basically everything stays the same. I just added a condition at the top to keep track of mismatches. I hope this is a good workaround the error. As far as I can tell it shouldn't impact the results of the method but I might be overlooking or misunderstanding something.

Please let me know your thoughts!

clivehoggart commented 1 month ago

Many thanks for that, I've added your update to the repository. Error would not have affected my analyses as I sumstats and reference panel were from the source, i.e. UKB for real data analyses and HM for simulations. Best Clive

On Tue, 9 Jul 2024 at 15:26, Paola Arguello-Pascualli < @.***> wrote:

Hi there!

I have been working with simulated data using BridgePRS and I found that for exactly the same pipeline a phenotype returned an error due to differing number of dimensions. After some long hours looking at my data and the code, I found the source of the error in the get.pred.clump function (within src/functions.R). It seems that as written, the program does not account for cases where the intersection between beta.bar snps (from the large GWAS) and the ones available in the genotyping data for target data (bim file). I found that it was easily solved by restricting the beta.bar to only the overlaping snps. Below is the get.pred.clump function with the correction:

`get.pred.clump <- function( beta.bar, ptr.beta.use, clump.id, X.bed, bim, ids.use, strand.check ){

snps <- intersect( beta.bar$snp, bim$V2 )

if(length(snps) < length(beta.bar$snp)){ print("Warning!: intersection is lower than beta.bar!") print(paste0(length(snps)," > ", length(beta.bar$snp))) beta.bar <- beta.bar[beta.bar$snp %in% snps,] }

X <- X.bed[ ids.use, beta.bar$snp, drop=FALSE ]

ptr <- match( beta.bar$snp, colnames(X) ) X <- X[,ptr,drop=FALSE] ptr.bed <- match( beta.bar$snp, colnames(X.bed) )

coded.allele <- bim$V5[ptr.bed] other.allele <- bim$V6[ptr.bed]

swtch <- ifelse( coded.allele==beta.bar$effect.allele & other.allele==beta.bar$ref.allele, 1, 0 )

swtch <- ifelse( coded.allele==beta.bar$ref.allele & other.allele==beta.bar$effect.allele, -1, swtch )

ptr.miss <- which( swtch==0 ) if( strand.check & length(ptr.miss)>0 ){ coded.allele1 <- alt.strand( coded.allele ) other.allele1 <- alt.strand( other.allele )

swtch[ptr.miss] <- ifelse( coded.allele1[ptr.miss] ==
                           beta.bar$effect.allele[ptr.miss] &
                           other.allele1[ptr.miss] ==
                           beta.bar$ref.allele[ptr.miss],
                          1, 0 )

swtch[ptr.miss] <- ifelse( coded.allele1[ptr.miss]==
                           beta.bar$ref.allele[ptr.miss] &
                           other.allele1[ptr.miss]==
                           beta.bar$effect.allele[ptr.miss],
                          -1, swtch[ptr.miss] )

} ptr.use <- which( swtch!=0 ) if( length(ptr.use)>0 ){ X <- X[,ptr.use,drop=FALSE] beta.bar <- beta.bar[ptr.use,,drop=FALSE]

ptr <- which(swtch == -1)
if( length(ptr)>0 ){
    X[,ptr] <- 2 - X[,ptr]
}

m <- apply( X, 2, mean, na.rm=TRUE )
for( ii in 1:ncol(X) ){
    X[,ii] <- ifelse( is.na(X[,ii]), m[ii], X[,ii] )
}
pred <- as.matrix(X) %*% as.matrix(beta.bar[,ptr.beta.use])
ret <- pred

}else{ ret <- matrix(data=0,ncol=length(ptr.beta.use),nrow=nrow(X)) } rownames(ret) <- rownames(X)

return(ret)

} `

Basically everything stays the same. I just added a condition at the top to keep track of mismatches. I hope this is a good workaround the error. As far as I can tell it shouldn't impact the results of the method but I might be overlooking or misunderstanding something.

Please let me know your thoughts!

— Reply to this email directly, view it on GitHub https://github.com/clivehoggart/BridgePRS/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUD6OGKJNXXKWTNJCOXBXDZLQ2NVAVCNFSM6AAAAABKTPLC76VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TSMBQGAZTMMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>