Open tetedange13 opened 1 month ago
Hi Felix, do you mean by pooled that all reads from all samples are mixed, without barcodes so you don't know which reads came from which samples? It's possible that somalier can help here, but it's not designed for that. And certainly, --infer will not work well (if at all) for that case. If you children are sequenced individually, you could look at the rate of IBS0 to the parent pool. That should be very close to 0 if the parent is in the pool, but even that might not be reliable because if only a single parent has the allele, the ratio will be very low and it might be called as hom-ref.
Thanks for your quick answer !
Yes I meant "pooled parents" exactly as you described and our children are well sequenced individually
For relatedness, IBS0 is indeed a good indicator => With child having always a IBS0 under 20 with their parental pool (versus IBS0 above 50 with any other unrelated pool)
I also found Homozygous concordance to be a good metric too
=> With "child - pooled_parent" relationships always being above 0.6-0.65 when parent is well in the pool (and lower otherwise)
=> All "pool to pool" relationships exhibit low IBS0, but they never have high enough "hom_concord" (so even better metric than IBS0 in my case ?)
If --ped
is the method to go, I would really benefit from being able to have duplicate sampleID in input PED (at condition that they have different famID)
=> It would be essentially to have a correct "expectedrelatedness" set in "pairs.tsv"
=> For all possible "child{1,2,3,4} - pooled_parents_1+2+3+4" relationships of a given pool (I hope I am clear enough here)
In regard of guessing from data the number of samples pooled together, I also made some progress :
(somalier_relate.html
is very handy for all that)
Number of samples in pool | n_hom_ref |
---|---|
1 | > 5000 |
2 | ~ 2500 |
3 | ~ 1500 |
4 | ~ 1000 |
=> I rather use "fraction of hom_alt" (= hom_alt / (hom_alt +het + hom_ref)
"
=> And after plotting this fraction against "expected_ploidy", I found a good linear correlation
=> With int(-12.5 * frac_hom_alt + 5.3)
giving a rounded estimate of number of samples in pool
Thanks again ! Best regards, Felix.
Hi,
First thanks for developping
somalier
, it is a great tool !In my team we have exome data, with pooled parents, most of the time 4 mums and 4 dads together => I run
somalier
directly on BAMs and I would have a few questions if you do not mind :Have you experienced using
somalier
for this specific case of pooled parents ? => From what I tested withrelate --infer
method, pools have always a relatedness around 0.5 => And for a given parent pool, relatedness is the same between the child of these parents versus any other child (from a different family) => So it cannot be used to verify that a parent of a given child is well present in its corresponding parental poolI also noticed that with the
relate --ped
option,somalier
does not allow duplicated samples even if they have different famID => Would it be possible to consider duplication only among a given family ? => I noticed--sample-prefix
options, but AFAIK it does not fit my need as I want to use the same ".somalier" file multiple timesDo you have any hints about using
somalier
to guess ploidy ? => Would be to make sure that predicted ploidy is correct (as a quality control to spot forgotten sample in pool) => Maybe using "scaled mean depth on chrX" metric ?Thanks for any kind of help on this ! Best regards, Felix.