PGScatalog / PGS_Catalog

An open database of polygenic scores and relevant metadata needed to apply and evaluate them correctly.
Apache License 2.0
9 stars 5 forks source link

Storage of Redundant Scores #382

Open DarioS opened 1 month ago

DarioS commented 1 month ago

Should the catalogue incorporate scores which are effectively copies of older work?

s1 <- read.delim("PGS000665_hmPOS_GRCh38.txt", skip = 19)
s2 <- read.delim("PGS004597_hmPOS_GRCh38.txt", skip = 19)
> nrow(s1)
  32
> nrow(s2)
  32
s2 <- s2[match(s1$rsID, s2$rsID), ]
plot(s1$effect_weight, s2$effect_weight, xlab = "PGS000665", ylab = "PGS004597", pch = 19, cex = 0.5)

image

smlmbrt commented 2 weeks ago

We try our best not to. If papers don't declare a score to be a duplicate or re-use of an existing score we will create a new one - this is probably an edge case (that happens when only GWAS-significant SNPs are used).