brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
270 stars 35 forks source link

chrX sites count #133

Open AinaMontalban opened 8 months ago

AinaMontalban commented 8 months ago

Hello,

I am using somalier to perform sex QC on WES samples, and I have a question regarding the number of sites on chrX.

I created a VCF file containing 203 sites from chrX using our target bed file and the somalier find-sites command. Then, when I ran the command somalier extract and relate on a sample, I noticed that Somalier reports X_n as 199, even though there should be 203 sites on chrX:

X_depth_mean X_n X_hom_ref X_het X_hom_alt
81.56 199 103 0 96

Why are we missing 4 sites?

Initially, I thought that the -d (MIN_DEPTH) parameter might be the reason. So, I ran the HaplotypeCaller with a base-pair resolution to check the allele depth of these sites. But none of the sites had a depth of less than 7. However, 4 sites had a GQ<30:

chrX    69670203    .   C   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:60,19:79:**0**:0,0,1333
chrX    119934456   .   G   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:19,2:21:**1**:0,1,642
chrX    136491943   .   A   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:36,10:46:**0**:0,0,923
chrX    151397088   .   G   <NON_REF>   .   .   .   GT:AD:DP:GQ:PL  0/0:10,0:10:**24**:0,24,360

Does somalier exclude genotypes with low quality? Or is it something else?