Closed AdrienOliva closed 3 years ago
See the documentation under CONVERTF/
:
strandcheck: If set to YES, then check that the allele strand is the same in both data sets -- if they are different (e.g A/C vs. T/G), then flip genotype data appropriately. (Note that if strandcheck is set to YES, then all A/T and C/G SNPs will be removed because it is impossible to know whether the allele strand is the same in both data sets. On the other hand, if strandcheck is set to NO, then A/T and C/G SNPs will be retained since it is assumed that both data sets are on the same strand.)
You are merging with strandcheck: YES and A/Tor C/G snps will be dropped as mergeit cannot determine if the alleles are ion the same strand..
This is a feature not a bug. I
On Mon, Apr 12, 2021 at 8:46 AM Adrien Oliva @.***> wrote:
Hi, I have noticed that behaviour using different datasets but to debug it, in this example, I'll show you the results of using mergeit on the same datasets and merge it with itself (only changing the name of the individuals).
My datasets are as follow:
.snp file for both datasets
Affx-34462167 1 0.0 567137 X X
rs3094315 1 0.0 752566 G A
rs12562034 1 0.0 768448 X X
rs12124819 1 0.0 776546 A G
1_777122 1 0.0 777122 A T
1_832756 1 0.0 832756 T G
rs28765502 1 0.0 832918 T C
1_834832 1 0.0 834832 G C
1_838555 1 0.0 838555 C A
1_838931 1 0.0 838931 A C
rs7419119 1 0.0 842013 T G
1_843249 1 0.0 843249 X X
1_843251 1 0.0 843251 X X
1_846543 1 0.0 846543 G T
rs950122 1 0.0 846864 G C
1_851204 1 0.0 851204 G C
1_852443 1 0.0 852443 A C
1_852964 1 0.0 852964 T G
1_853954 1 0.0 853954 C A
1_854978 1 0.0 854978 A C
.geno file for both datasets
90990
90002
90999
02992
90992
92200
92090
92000
92900
92922
22090
92202
99229
92922
22220
99020
99992
90900
90922
90022
.ind for dataset1
BWA_Goyet U BWA
BWA_Malta U BWA
BWA_Vestonice U BWA
BWA_Tianyuan U BWA
BWA_Yana U BWA
.ind for dataset2
BWA2_Goyet U BWA
BWA2_Malta U BWA
BWA2_Vestonice U BWA
BWA2_Tianyuan U BWA
BWA2_Yana U BWA
Then I merge those 2 dataset using this script:
mergeit -p <(echo "geno1: ${in1}.geno
snp1: ${in1}.snp
ind1: ${in1}.ind
geno2: ${in2}.geno
snp2: ${in2}.snp
ind2: ${in2}.ind
genooutfilename: ${out}.geno
snpoutfilename: ${out}.snp
indoutfilename: ${out}.ind
outputformat: PACKEDANCESTRYMAP
docheck: YES
strandcheck: YES
hashcheck: NO")
And here is output .snp file:
rs3094315 1 0.007526 752566 G A
rs12124819 1 0.007765 776546 A G
1_832756 1 0.008328 832756 T G
rs28765502 1 0.008329 832918 T C
1_838555 1 0.008386 838555 C A
1_838931 1 0.008389 838931 A C
rs7419119 1 0.008420 842013 T G
1_846543 1 0.008465 846543 G T
1_852443 1 0.008524 852443 A C
1_852964 1 0.008530 852964 T G
1_853954 1 0.008540 853954 C A
1_854978 1 0.008550 854978 A C
As expected the resulted .snp file does not have the snps with a 'X' allele. However if you look at position '777122' or '834832', they are not merging even thought they should.
I found this behaviour using:
- Unix system / Eigensoft installer through Conda / mergeit version “2450”
- MacOS system / Eigensoft installed through GitHub / mergeit version "2600"
Do you know where this can come from ?
Thanks a lot, Adrien
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DReichLab/EIG/issues/64, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEE77B5NPOVCNB5GUKBJ75TTILTQ5ANCNFSM42ZHDC6A .
Hi, I have noticed that behaviour using different datasets but to debug it, in this example, I'll show you the results of using mergeit on the same datasets and merge it with itself (only changing the name of the individuals).
My datasets are as follow:
.snp file for both datasets
.geno file for both datasets
.ind for dataset1
.ind for dataset2
Then I merge those 2 dataset using this script:
And here is output .snp file:
As expected the resulted .snp file does not have the snps with a 'X' allele. However if you look at position '777122' or '834832', they are not merging even thought they should.
I found this behaviour using:
Do you know where this can come from ?
Thanks a lot, Adrien