DReichLab / EIG

Eigen tools by Nick Patterson and Alkes Price lab
Other
180 stars 60 forks source link

Segmentation fault with 1000G but not with HM3 #76

Open swvanderlaan opened 2 years ago

swvanderlaan commented 2 years ago

Hi,

I am following this protocol: https://www.nature.com/articles/nprot.2010.116. At some point it will make use of smartpca. This works beautifully for HapMap 3 data:

perl ~/git/EIG/bin/smartpca.perl \
> -i rawdata/rawdata.hapmap3r2.pruned.bed \
> -a rawdata/rawdata.hapmap3r2.pruned.pedsnp \
> -b rawdata/rawdata.hapmap3r2.pruned.pedind \
> -k 10 \
> -o rawdata/rawdata.hapmap3r2.pruned.pca \
> -p rawdata/rawdata.hapmap3r2.pruned.plot \
> -e rawdata/rawdata.hapmap3r2.pruned.eval \
> -l rawdata/rawdata.hapmap3r2.pruned.log \
> -m 5 \
> -t 10 \
> -s 6.0 \
> -w reference/hapmap3r2_CEU.CHB.JPT.YRI-pca-populations.txt
smartpca -p rawdata/rawdata.hapmap3r2.pruned.pca.par >rawdata/rawdata.hapmap3r2.pruned.log
ploteig -i rawdata/rawdata.hapmap3r2.pruned.pca.evec -c 1:2  -p Case:Control:3:4:5:6  -x  -y  -o rawdata/rawdata.hapmap3r2.pruned.plot.xtxt
evec2pca.perl 10 rawdata/rawdata.hapmap3r2.pruned.pca.evec rawdata/rawdata.hapmap3r2.pruned.pedind rawdata/rawdata.hapmap3r2.pruned.pca

Here's the head, for instance, of the result:

cat rawdata/rawdata.hapmap3r2.pruned.pca.evec | head
           #eigvals:    43.658    23.724     1.738     1.055     1.052     1.046     1.046     1.042     1.038     1.034
                 1:1     0.0050      0.0807     -0.0021     -0.0024     -0.0045      0.0080      0.0033      0.0031     -0.0133      0.0112             Case
                 2:2     0.0045      0.0821     -0.0035      0.0012     -0.0004      0.0122     -0.0046      0.0051     -0.0101     -0.0056          Control
                 3:3     0.0055      0.0813     -0.0026     -0.0064      0.0005      0.0153      0.0010      0.0047     -0.0042      0.0110             Case
                 4:4     0.0049      0.0801      0.0029      0.0037      0.0022      0.0126      0.0022      0.0040     -0.0078      0.0047             Case
                 5:5     0.0047      0.0812     -0.0004      0.0002     -0.0034      0.0124     -0.0003      0.0025     -0.0063      0.0129             Case
                 6:6     0.0045      0.0807     -0.0021      0.0026     -0.0155      0.0168      0.0090      0.0083     -0.0026      0.0086             Case
                 7:7     0.0051      0.0809      0.0007      0.0009     -0.0017      0.0190      0.0068      0.0027     -0.0019      0.0062             Case
                 8:8     0.0050      0.0813      0.0016     -0.0046     -0.0023      0.0152     -0.0028      0.0042     -0.0048     -0.0002             Case
                 9:9     0.0053      0.0816     -0.0024     -0.0001     -0.0009      0.0107      0.0047      0.0046     -0.0095      0.0011             Case

But when I try to do the exact same thing using the 1000G data, it breaks down:

perl ~/git/EIG/bin/smartpca.perl \
> -i rawdata/rawdata.1kg_phase1.pruned.bed \
> -a rawdata/rawdata.1kg_phase1.pruned.pedsnp \
> -b rawdata/rawdata.1kg_phase1.pruned.pedind \
> -k 2 \
> -o rawdata/rawdata.1kg_phase1.pruned.pca \
> -p rawdata/rawdata.1kg_phase1.pruned.plot \
> -e rawdata/rawdata.1kg_phase1.pruned.eval \
> -l rawdata/rawdata.1kg_phase1.pruned.log \
> -t 2 \
> -w reference/1kg_phase1_all/1kg-pca-populations.txt
smartpca -p rawdata/rawdata.1kg_phase1.pruned.pca.par >rawdata/rawdata.1kg_phase1.pruned.log
sh: line 1: 40371 Segmentation fault: 11  smartpca -p rawdata/rawdata.1kg_phase1.pruned.pca.par > rawdata/rawdata.1kg_phase1.pruned.log
ploteig -i rawdata/rawdata.1kg_phase1.pruned.pca.evec -c 1:2  -p   -x  -y  -o rawdata/rawdata.1kg_phase1.pruned.plot.xtxt
evec2pca.perl 2 rawdata/rawdata.1kg_phase1.pruned.pca.evec rawdata/rawdata.1kg_phase1.pruned.pedind rawdata/rawdata.1kg_phase1.pruned.pca

Is this a memory issue? Or something else I am missing?

I should add, that I did have it running, once back in 2019. It then produced this contents: using the exact same input... I add the output here. It used smartpca version 16000.

rawdata.1kg_phase1.pruned.log

Many thanks

Sander

bumblenick commented 2 years ago

smartpca.perl is not really being maintained though I cc Alkes Price whose lab was maintaining this. But at least extract the .par file -- run smartpca outside the perl wrapper and ask your system for more memory.

NIck

On Wed, Mar 30, 2022 at 5:37 AM Sander W. van der Laan < @.***> wrote:

Hi,

I am following this protocol: https://www.nature.com/articles/nprot.2010.116. At some point it will make use of smartpca. This works beautifully for HapMap 3 data:

perl ~/git/EIG/bin/smartpca.perl \

-i rawdata/rawdata.hapmap3r2.pruned.bed \ -a rawdata/rawdata.hapmap3r2.pruned.pedsnp \ -b rawdata/rawdata.hapmap3r2.pruned.pedind \ -k 10 \ -o rawdata/rawdata.hapmap3r2.pruned.pca \ -p rawdata/rawdata.hapmap3r2.pruned.plot \ -e rawdata/rawdata.hapmap3r2.pruned.eval \ -l rawdata/rawdata.hapmap3r2.pruned.log \ -m 5 \ -t 10 \ -s 6.0 \ -w reference/hapmap3r2_CEU.CHB.JPT.YRI-pca-populations.txt smartpca -p rawdata/rawdata.hapmap3r2.pruned.pca.par >rawdata/rawdata.hapmap3r2.pruned.log ploteig -i rawdata/rawdata.hapmap3r2.pruned.pca.evec -c 1:2 -p Case:Control:3:4:5:6 -x -y -o rawdata/rawdata.hapmap3r2.pruned.plot.xtxt evec2pca.perl 10 rawdata/rawdata.hapmap3r2.pruned.pca.evec rawdata/rawdata.hapmap3r2.pruned.pedind rawdata/rawdata.hapmap3r2.pruned.pca

Here's the head, for instance, of the result:

cat rawdata/rawdata.hapmap3r2.pruned.pca.evec | head

eigvals: 43.658 23.724 1.738 1.055 1.052 1.046 1.046 1.042 1.038 1.034

             1:1     0.0050      0.0807     -0.0021     -0.0024     -0.0045      0.0080      0.0033      0.0031     -0.0133      0.0112             Case
             2:2     0.0045      0.0821     -0.0035      0.0012     -0.0004      0.0122     -0.0046      0.0051     -0.0101     -0.0056          Control
             3:3     0.0055      0.0813     -0.0026     -0.0064      0.0005      0.0153      0.0010      0.0047     -0.0042      0.0110             Case
             4:4     0.0049      0.0801      0.0029      0.0037      0.0022      0.0126      0.0022      0.0040     -0.0078      0.0047             Case
             5:5     0.0047      0.0812     -0.0004      0.0002     -0.0034      0.0124     -0.0003      0.0025     -0.0063      0.0129             Case
             6:6     0.0045      0.0807     -0.0021      0.0026     -0.0155      0.0168      0.0090      0.0083     -0.0026      0.0086             Case
             7:7     0.0051      0.0809      0.0007      0.0009     -0.0017      0.0190      0.0068      0.0027     -0.0019      0.0062             Case
             8:8     0.0050      0.0813      0.0016     -0.0046     -0.0023      0.0152     -0.0028      0.0042     -0.0048     -0.0002             Case
             9:9     0.0053      0.0816     -0.0024     -0.0001     -0.0009      0.0107      0.0047      0.0046     -0.0095      0.0011             Case

But when I try to do the exact same thing using the 1000G data, it breaks down:

perl ~/git/EIG/bin/smartpca.perl \

-i rawdata/rawdata.1kg_phase1.pruned.bed \ -a rawdata/rawdata.1kg_phase1.pruned.pedsnp \ -b rawdata/rawdata.1kg_phase1.pruned.pedind \ -k 2 \ -o rawdata/rawdata.1kg_phase1.pruned.pca \ -p rawdata/rawdata.1kg_phase1.pruned.plot \ -e rawdata/rawdata.1kg_phase1.pruned.eval \ -l rawdata/rawdata.1kg_phase1.pruned.log \ -t 2 \ -w reference/1kg_phase1_all/1kg-pca-populations.txt smartpca -p rawdata/rawdata.1kg_phase1.pruned.pca.par >rawdata/rawdata.1kg_phase1.pruned.log sh: line 1: 40371 Segmentation fault: 11 smartpca -p rawdata/rawdata.1kg_phase1.pruned.pca.par > rawdata/rawdata.1kg_phase1.pruned.log ploteig -i rawdata/rawdata.1kg_phase1.pruned.pca.evec -c 1:2 -p -x -y -o rawdata/rawdata.1kg_phase1.pruned.plot.xtxt evec2pca.perl 2 rawdata/rawdata.1kg_phase1.pruned.pca.evec rawdata/rawdata.1kg_phase1.pruned.pedind rawdata/rawdata.1kg_phase1.pruned.pca

Is this a memory issue? Or something else I am missing?

Many thanks

Sander

— Reply to this email directly, view it on GitHub https://github.com/DReichLab/EIG/issues/76, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B4XYBNOFDX3IQDP5JDVCQOENANCNFSM5SBJ3OEA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

maspil commented 7 months ago

Hi, I am following the same protocol, and I am having the exact same issue only with hapmap3 data (segmentation failt (core dumped). Did you find a solution for this? Log:

parameter file: RAW.hapmap3r2.pruned.pca.par
### THE INPUT PARAMETERS
##PARAMETER NAME: VALUE
genotypename: RAW.hapmap3r2.pruned.bed
snpname: RAW.hapmap3r2.pruned.pedsnp
indivname: RAW.hapmap3r2.pruned.pedind
evecoutname: RAW.hapmap3r2.pruned.pca.evec
evaloutname: RAW.hapmap3r2.pruned.eval
altnormstyle: NO
numoutevec: 2
numoutlieriter: 5
numoutlierevec: 2
outliersigmathresh: 6
qtmode: 0
poplistname: pca-populations.txt
## smartpca version: 18140
norm used

lsqproject used
*** warning.  genetic distances are in cM not Morgans
1   rs6688969   119 3743391 T   C

1:Sample1 ignored
2:Sample2 ignored
3:Sample3 ignored
4:Sample4 ignored ...etc etc...

Thanks, Mari