brielin / Popcorn

Software for estimating correlation of trait effect sizes across populations
38 stars 15 forks source link

Discrepancies between Popcorn GitHub Description and Actual Results #40

Open Taewoong-Ha opened 1 year ago

Taewoong-Ha commented 1 year ago

Hello,

I would like to address the inconsistencies between the information provided in the Popcorn GitHub repository and the actual results obtained from the files. Specifically, I have observed differences between the pre-computed score file and the allele frequencies (AF) obtained from the NCBI database for the EUR and EAS populations of the 1000 Genomes project.

According to the header description in the file, the columns are organized as follows: Chromosome, Base Position, SNP ID, Allele 1, Allele 2, Frequency of A1 in POP1, Frequency of A1 in POP2, LD score in POP1, LD score in POP2, and Cross-covariance score.

Upon reviewing the contents of the top 10 entries in the file and cross-referencing the SNPs with the NCBI database, I discovered that the AF values in the pre-computed score file mostly differ from the AF values obtained from NCBI (Table 1).

image

In light of these findings, I attempted to compute the scores myself using the EUR & EAS bed files from the 1000 Genomes project. I utilized the EUR & EAS bed files provided by MAGMA, which can be found at https://ctg.cncr.nl/software/magma. I have included a partial display of the results (Table 2) and the corresponding NCBI search results (Table 3). image

image

Upon analyzing these results, it appears that the "Frequency of A1 in POP1" and "Frequency of A1 in POP2" columns in the score file generated by the Popcorn software represent the frequency of A2 rather than A1.

Such discrepancies could potentially have an impact on the calculation of LD scores and cross-covariance scores for the two populations. Additionally, while it may not significantly affect the effect model used to calculate correlations using the "fit" function, it is likely to influence the results of the impact model that takes AF into account.

If I have misunderstood any aspect or made any mistakes during this process, I would greatly appreciate your clarification.

Thank you very much.

Best regards,