Open swvanderlaan opened 2 years ago
smartpca.perl is not really being maintained though I cc Alkes Price whose lab was maintaining this. But at least extract the .par file -- run smartpca outside the perl wrapper and ask your system for more memory.
NIck
On Wed, Mar 30, 2022 at 5:37 AM Sander W. van der Laan < @.***> wrote:
Hi,
I am following this protocol: https://www.nature.com/articles/nprot.2010.116. At some point it will make use of smartpca. This works beautifully for HapMap 3 data:
perl ~/git/EIG/bin/smartpca.perl \
-i rawdata/rawdata.hapmap3r2.pruned.bed \ -a rawdata/rawdata.hapmap3r2.pruned.pedsnp \ -b rawdata/rawdata.hapmap3r2.pruned.pedind \ -k 10 \ -o rawdata/rawdata.hapmap3r2.pruned.pca \ -p rawdata/rawdata.hapmap3r2.pruned.plot \ -e rawdata/rawdata.hapmap3r2.pruned.eval \ -l rawdata/rawdata.hapmap3r2.pruned.log \ -m 5 \ -t 10 \ -s 6.0 \ -w reference/hapmap3r2_CEU.CHB.JPT.YRI-pca-populations.txt smartpca -p rawdata/rawdata.hapmap3r2.pruned.pca.par >rawdata/rawdata.hapmap3r2.pruned.log ploteig -i rawdata/rawdata.hapmap3r2.pruned.pca.evec -c 1:2 -p Case:Control:3:4:5:6 -x -y -o rawdata/rawdata.hapmap3r2.pruned.plot.xtxt evec2pca.perl 10 rawdata/rawdata.hapmap3r2.pruned.pca.evec rawdata/rawdata.hapmap3r2.pruned.pedind rawdata/rawdata.hapmap3r2.pruned.pca
Here's the head, for instance, of the result:
cat rawdata/rawdata.hapmap3r2.pruned.pca.evec | head
eigvals: 43.658 23.724 1.738 1.055 1.052 1.046 1.046 1.042 1.038 1.034
1:1 0.0050 0.0807 -0.0021 -0.0024 -0.0045 0.0080 0.0033 0.0031 -0.0133 0.0112 Case 2:2 0.0045 0.0821 -0.0035 0.0012 -0.0004 0.0122 -0.0046 0.0051 -0.0101 -0.0056 Control 3:3 0.0055 0.0813 -0.0026 -0.0064 0.0005 0.0153 0.0010 0.0047 -0.0042 0.0110 Case 4:4 0.0049 0.0801 0.0029 0.0037 0.0022 0.0126 0.0022 0.0040 -0.0078 0.0047 Case 5:5 0.0047 0.0812 -0.0004 0.0002 -0.0034 0.0124 -0.0003 0.0025 -0.0063 0.0129 Case 6:6 0.0045 0.0807 -0.0021 0.0026 -0.0155 0.0168 0.0090 0.0083 -0.0026 0.0086 Case 7:7 0.0051 0.0809 0.0007 0.0009 -0.0017 0.0190 0.0068 0.0027 -0.0019 0.0062 Case 8:8 0.0050 0.0813 0.0016 -0.0046 -0.0023 0.0152 -0.0028 0.0042 -0.0048 -0.0002 Case 9:9 0.0053 0.0816 -0.0024 -0.0001 -0.0009 0.0107 0.0047 0.0046 -0.0095 0.0011 Case
But when I try to do the exact same thing using the 1000G data, it breaks down:
perl ~/git/EIG/bin/smartpca.perl \
-i rawdata/rawdata.1kg_phase1.pruned.bed \ -a rawdata/rawdata.1kg_phase1.pruned.pedsnp \ -b rawdata/rawdata.1kg_phase1.pruned.pedind \ -k 2 \ -o rawdata/rawdata.1kg_phase1.pruned.pca \ -p rawdata/rawdata.1kg_phase1.pruned.plot \ -e rawdata/rawdata.1kg_phase1.pruned.eval \ -l rawdata/rawdata.1kg_phase1.pruned.log \ -t 2 \ -w reference/1kg_phase1_all/1kg-pca-populations.txt smartpca -p rawdata/rawdata.1kg_phase1.pruned.pca.par >rawdata/rawdata.1kg_phase1.pruned.log sh: line 1: 40371 Segmentation fault: 11 smartpca -p rawdata/rawdata.1kg_phase1.pruned.pca.par > rawdata/rawdata.1kg_phase1.pruned.log ploteig -i rawdata/rawdata.1kg_phase1.pruned.pca.evec -c 1:2 -p -x -y -o rawdata/rawdata.1kg_phase1.pruned.plot.xtxt evec2pca.perl 2 rawdata/rawdata.1kg_phase1.pruned.pca.evec rawdata/rawdata.1kg_phase1.pruned.pedind rawdata/rawdata.1kg_phase1.pruned.pca
Is this a memory issue? Or something else I am missing?
Many thanks
Sander
— Reply to this email directly, view it on GitHub https://github.com/DReichLab/EIG/issues/76, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B4XYBNOFDX3IQDP5JDVCQOENANCNFSM5SBJ3OEA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi, I am following the same protocol, and I am having the exact same issue only with hapmap3 data (segmentation failt (core dumped). Did you find a solution for this? Log:
parameter file: RAW.hapmap3r2.pruned.pca.par
### THE INPUT PARAMETERS
##PARAMETER NAME: VALUE
genotypename: RAW.hapmap3r2.pruned.bed
snpname: RAW.hapmap3r2.pruned.pedsnp
indivname: RAW.hapmap3r2.pruned.pedind
evecoutname: RAW.hapmap3r2.pruned.pca.evec
evaloutname: RAW.hapmap3r2.pruned.eval
altnormstyle: NO
numoutevec: 2
numoutlieriter: 5
numoutlierevec: 2
outliersigmathresh: 6
qtmode: 0
poplistname: pca-populations.txt
## smartpca version: 18140
norm used
lsqproject used
*** warning. genetic distances are in cM not Morgans
1 rs6688969 119 3743391 T C
1:Sample1 ignored
2:Sample2 ignored
3:Sample3 ignored
4:Sample4 ignored ...etc etc...
Thanks, Mari
Hi,
I am following this protocol: https://www.nature.com/articles/nprot.2010.116. At some point it will make use of
smartpca
. This works beautifully for HapMap 3 data:Here's the
head
, for instance, of the result:But when I try to do the exact same thing using the 1000G data, it breaks down:
Is this a memory issue? Or something else I am missing?
I should add, that I did have it running, once back in 2019. It then produced this contents: using the exact same input... I add the output here. It used smartpca version 16000.
rawdata.1kg_phase1.pruned.log
Many thanks
Sander