JonJala / mama

MIT License
13 stars 4 forks source link

Final SNP count = 45, too much SNPs have been filter out due to non-positive-semi-definiteness of omega / sigma #20

Closed JiaShun-Xiao closed 3 years ago

JiaShun-Xiao commented 3 years ago

Hi, I have been running the tutorial code and the final result confused me a lot, It only reported the result of 45 SNPs, it is normal? or I made some mistakes here. below is the log information

python3 ../mama.py --sumstats "./EAS_BMI.txt.gz,EAS,BMI" "./EUR_BMI.txt.gz,EUR,BMI" \
>                    --ld-scores "./chr22_mind02_geno02_maf01_EAS_EUR.l2.ldscore.gz" \
>                    --out "./BMI_MAMA" \
>                    --add-a1-col-match "EA" \
>                    --add-a2-col-match "OA"

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MAMA: Multi-Ancestry Meta-Analysis
<> Version: 1.0.0
<> (C) 2020 Social Science Genetic Association Consortium (SSGAC)
<> MIT License
<>
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Software-related correspondence: grantgoldman0@gmail.com or jjala.ssgac@gmail.com
<> All other correspondence: paturley@broadinstitute.org
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

See full log at: /import/home/jxiaoae/cross-popu/MAMA/mama/tutorial/BMI_MAMA.log

Program executed via:
../mama.py  \
        --sumstats ./EAS_BMI.txt.gz,EAS,BMI ./EUR_BMI.txt.gz,EUR,BMI  \
        --ld-scores ./chr22_mind02_geno02_maf01_EAS_EUR.l2.ldscore.gz  \
        --out ./BMI_MAMA  \
        --add-a1-col-match EA  \
        --add-a2-col-match OA

INSTALLED VERSIONS
------------------
commit           : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3
python           : 3.6.9.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-142-generic
Version          : #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : C.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.4
numpy            : 1.19.4
pytz             : 2021.1
dateutil         : 2.8.1
pip              : 21.1.2
setuptools       : 57.0.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.4
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

Reading in and running QC on LD Scores
Reading in LD Scores file: /import/home/jxiaoae/cross-popu/MAMA/mama/tutorial/chr22_mind02_geno02_maf01_EAS_EUR.l2.ldscore.gz

Reading in summary statistics.

Reading in ('EAS', 'BMI') sumstats file: /import/home/jxiaoae/cross-popu/MAMA/mama/tutorial/EAS_BMI.txt.gz

Running QC on ('EAS', 'BMI') summary statistics

Initial number of SNPs / rows = 119557

Filtered out 0 SNPs with "NO NAN" (Filters out SNPs with any NaN values in required columns {'SNP', 'BETA', 'A2', 'SE', 'BP', 'A1', 'FREQ', 'CHR', 'P'})
Filtered out 25359 SNPs with "FREQ BOUNDS" (Filters out SNPs with FREQ values outside of [0.01, 0.99])
Filtered out 0 SNPs with "SE BOUNDS" (Filters out SNPs with non-positive SE values)
Filtered out 0 SNPs with "CHR VALUES" (Filters out SNPs with listed chromosomes not in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'])
Filtered out 0 SNPs with "DUPLICATE ALLELE SNPS" (Filters out SNPs with major allele = minor allele)
Filtered out 16705 SNPs with "PALINDROMIC SNPS" (Filters out SNPs where major / minor alleles are a base pair)
Filtered out 81 SNPs with "INVALID ALLELES" (Filters out SNPs with alleles not in {'T', 'G', 'A', 'C'})
Filtered out 0 SNPs with "NEGATIVE GWAS P" (Filters out SNPs with negative GWAS P values)

Filtered out 38952 SNPs in total (as the union of drops, this may be less than the total of all the per-filter drops)
Additionally dropped 55 duplicate SNPs

Reading in ('EUR', 'BMI') sumstats file: /import/home/jxiaoae/cross-popu/MAMA/mama/tutorial/EUR_BMI.txt.gz

Running QC on ('EUR', 'BMI') summary statistics

Initial number of SNPs / rows = 179766

Filtered out 37 SNPs with "NO NAN" (Filters out SNPs with any NaN values in required columns {'SNP', 'BETA', 'A2', 'SE', 'BP', 'A1', 'FREQ', 'CHR', 'P'})
Filtered out 51075 SNPs with "FREQ BOUNDS" (Filters out SNPs with FREQ values outside of [0.01, 0.99])
Filtered out 0 SNPs with "SE BOUNDS" (Filters out SNPs with non-positive SE values)
Filtered out 0 SNPs with "CHR VALUES" (Filters out SNPs with listed chromosomes not in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'])
Filtered out 0 SNPs with "DUPLICATE ALLELE SNPS" (Filters out SNPs with major allele = minor allele)
Filtered out 23302 SNPs with "PALINDROMIC SNPS" (Filters out SNPs where major / minor alleles are a base pair)
Filtered out 16624 SNPs with "INVALID ALLELES" (Filters out SNPs with alleles not in {'T', 'G', 'A', 'C'})
Filtered out 0 SNPs with "NEGATIVE GWAS P" (Filters out SNPs with negative GWAS P values)

Filtered out 81997 SNPs in total (as the union of drops, this may be less than the total of all the per-filter drops)
Additionally dropped 16 duplicate SNPs

Number of SNPS in initial intersection of all sources: 260

Standardizing reference alleles in summary statistics.
Standardized to population: ('EAS', 'BMI')
Dropped 0 SNPs during reference allele standardization.

Running LD Score regression.
Regression coefficients (LD):
[[-2.80824253e-06 -2.22343889e-06]
 [-2.22343889e-06  1.76193104e-06]]
Regression coefficients (Intercept):
[[ 5.53125241e-05 -1.26672725e-05]
 [-1.26672725e-05 -2.24825786e-05]]
Regression coefficients (SE^2):
[[-0.03778218  1.78041409]
 [ 1.78041409  4.05912105]]

Creating omega and sigma matrices.
Average Omega (including dropped slices) =
[[-5.90951990e-06 -2.61104158e-06]
 [-2.61104158e-06  1.38588458e-06]]
Average Sigma (including dropped slices) =
[[ 5.39919804e-05 -1.26672725e-05]
 [-1.26672725e-05  1.40080566e-05]]

Adjusted 26 SNPs to make omega positive semi-definite.

Dropped 197 SNPs due to non-positive-semi-definiteness of omega.
Dropped 92 SNPs due to non-positive-definiteness of sigma.
Dropped 215 total SNPs due to non-positive-(semi)-definiteness of omega / sigma.

Running main MAMA method.

Preparing results for output.

        Population 0: ('EAS', 'BMI')
                Mean Chi^2 for ('EAS', 'BMI') = 1.9745856481418846
        Population 1: ('EUR', 'BMI')
                Mean Chi^2 for ('EUR', 'BMI') = 3.6416598095191945

Final SNP count = 45
Writing results to disk.
        ./BMI_MAMA_EAS_BMI.res
        ./BMI_MAMA_EUR_BMI.res

Execution complete.
ggoldman1 commented 3 years ago

Hi @JiaShun-Xiao -- thanks for your question. Based off the package versions in your log file, it seems like you don't have the virtualenv installed (we ask for scipy version 1.6.1 but you have 1.5.4). Is that the case? If so, could you install and rerun?

JonJala commented 3 years ago

Also, it looks like there are only 260 SNPs in common between your groups (after accounting for QC filters, which do drop a decent number already). Does that seem right given the SNPs in each sample?

On Tue, Jun 1, 2021, 10:38 AM ggoldman1 @.***> wrote:

Hi @JiaShun-Xiao https://github.com/JiaShun-Xiao -- thanks for your question. Based off the package versions in your log file, it seems like you don't have the virtualenv installed (we ask for scipy version 1.6.1 but you have 1.5.4). Is that the case? If so, could you install and rerun?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/JonJala/mama/issues/20#issuecomment-852179718, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF57N4GHGJI7BJ6Y7UKDTQTWGNANCNFSM454VKV7A .

ggoldman1 commented 3 years ago

Yeah, there's not great overlap in this sample actually. I can get something with better coverage up in the meantime, but @JiaShun-Xiao the tutorial results are fine -- you should feel free to move on to your own data.

ggoldman1 commented 3 years ago

@JiaShun-Xiao see #21 for details.

JiaShun-Xiao commented 3 years ago

Hi @ggoldman1 @JonJala, thanks for your reply. MAMA is excellent work, you guys are doing a great job!