brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
254 stars 35 forks source link

Sex QC #73

Closed jrejente closed 2 years ago

jrejente commented 3 years ago

Still trying to find out how to incorporate sex into the plot. I see that all of the sex is listed as -9 in the samples. tsv file output file and that the source for the column is X and Y Chromosomes, but not sure how to activate the data in the plot as seen in the demo https://brentp.github.io/somalier/ex.html.

brentp commented 3 years ago

I'm not sure exactly what the question is, but if you give somalier a pedigree file with the sex column filled, then it will color the points according the the value in that column.

ashotmarg commented 2 years ago

Hi Brent,

I am writing it here, since I am having similar issue as @jrejente was having, namely the sex is listed as -9 in the output *.tsv file. I don't have any prior knowledge about the sex of individuals (though I could figure it out with other methods), but I was assuming somalier would be able to estimate it based on sex chromosomes, no? Or do we need to provide the ped file for that to work?

Thanks, Ashot

brentp commented 2 years ago

Hi, somalier will only adjust the sex if you use --infer

ashotmarg commented 2 years ago

Aaaah, I see, thanks a lot for the prompt reply! It's fine now.

ashotmarg commented 2 years ago

Hi Brent, I am wondering about how "find-sites" deals with the sex chromosomes. I know that --min-AN, --min-AF, --snp-dist is for the autosomes for the relatedness estimation. But not really sure exactly what criteria is used for filtering some of the sites on the sex chromosomes for estimating the sex of the individual. Thanks in advance!

brentp commented 2 years ago

Hi, the relevant code is here (AF between 0.04 and 0.96): https://github.com/brentp/somalier/blob/cd2162906fd6ffe0a4b3bdf3e1b55c0f949cf342/src/somalierpkg/findsites.nim#L181-L182

here (distance to adjacent site is 1000 for X, 200 for Y: https://github.com/brentp/somalier/blob/cd2162906fd6ffe0a4b3bdf3e1b55c0f949cf342/src/somalierpkg/findsites.nim#L262-L267

and here( take max of 10K sites for chrX and 5K sites for chrY): https://github.com/brentp/somalier/blob/cd2162906fd6ffe0a4b3bdf3e1b55c0f949cf342/src/somalierpkg/findsites.nim#L273-L280

these are hard-coded and not settable.

ashotmarg commented 2 years ago

Thanks for the quick replies Brent. I wasn't sure if there were also any GQ or QUAL filtering, but I guess not. It seems that the QUAL field is directly converted to 100.

brentp commented 2 years ago

That might be a good option to add, but you can also filter before sending to find-sites. The QUAL is set to 100 on the output to enable better compression so the resulting file is smaller.

ashotmarg commented 2 years ago

Yep, that's actually what I am doing now with quality scores, but I just wanted to know how find-sites was doing it, not to repeat some of the steps! Thanks.