danimfernandes / tkgwv2

An ancient DNA relatedness pipeline for ultra-low coverage whole genome shotgun data
GNU General Public License v2.0
6 stars 2 forks source link

SKIPPING due to no overlapping SNPs #7

Open batelz opened 8 months ago

batelz commented 8 months ago

Hi,

When running the plink2tkrelated pipeline (which is VERY easy to use -- thanks for that!), The analysis is skipped because it can't find overlapping SNPs.

The input: I prepared two separate .ped files, each corresponding to an individual, from a genotype file that I converted from EIG format to PLINK format. Given that I'm working with the 1240K genotype, and these samples are relatively high coverage (approximately x0.6 and x9), it's highly unlikely that the two samples do not share any SNPs.

I am using the 1240K allele frequencies you provided - just for testing.

My .ped files look like this: (I'm showing two samples as an example ) SN1: 15 SN1 0 0 1 -9 0 0 G G 0 0 0 0 0 0 0 0 A A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A A ..... SN2: 1 SN2 0 0 1 -9 A A A A 0 0 T T G G C C A A 0 0 C C T T C C G G T T C C ..... and maps: SN1:

1       rs3094315       0.02013 752566
1       rs12124819      0.020242        776546
1       rs28765502      0.022137        832918
1       rs7419119       0.022518        842013
1       rs950122        0.02272 846864
1       rs113171913     0.023436        869303
1       rs13302957      0.024116        891021
1       rs59986066      0.024183        893462
1       rs112905931     0.02426 896271
...

SN2:

1       rs3094315       0.02013 752566
1       rs12124819      0.020242        776546
1       rs28765502      0.022137        832918
1       rs7419119       0.022518        842013
1       rs950122        0.02272 846864
1       rs113171913     0.023436        869303
1       rs13302957      0.024116        891021
1       rs59986066      0.024183        893462
....

Output:

 ################################################################################
 ### TKGWV2 - An ancient DNA relatedness pipeline for ultra-low coverage data ###
 ## Version 1.0b - Released 07/2022
 #
 # [2024-03-28 13:03:00] Running 'plink2tkrelated' on folder .../tkgwv2
         # Text-PLINK >> Pairwise transposed text-PLINK >> Relatedness estimates
         # Files to be processed:
                SN1.ped       SN1.map
                 SN2.ped      SN2.map
                 SN3.ped    SN2.map
         # Arguments used:
                 --freqFile     ./1000GP3_EUR_1240K.frq

         # Estimating coefficient of relatedness Rxy for   SN1   SN2 SKIPPING due to no overlapping SNPs     (1/3)
         # Estimating coefficient of relatedness Rxy for   SN1   SN3       SKIPPING due to no overlapping SNPs     (2/3)
         # Estimating coefficient of relatedness Rxy for   SN2   SN3      SKIPPING due to no overlapping SNPs     (3/3)
 # [2024-03-28 13:03:22] All dyads processed
 # [2024-03-28 13:03:22] Results exported to TKGWV2_Results.txt

Any ideas what could have gone wrong? Thanks.

danimfernandes commented 7 months ago

Hi @batelz.

From the look of your .ped files, it looks like indeed the first one is mostly empty (majority of genotypes missing as 0 0), making it so that no SNP seems to have data in both samples. Perhaps something went bad when converting from EIG to PLINK?