Closed Jesson-mark closed 2 years ago
Hi @Jesson-mark, very sorry for the late reply, I somehow missed my alert emails from github.
Thanks very for reporting issue.
min_support
is the total number reads that carry the repeat fulfilling whatever conditions the user specified (e.g. min_str_len
, max_str_len
, min_ins_size
, etc).
min_cluster_size
is the minimum number of reads clustered for each allele.
The min_cluster_size
was added after the first release in scenarios when I noticed usually very big alleles will have fewer support reads than smaller alleles and I still want to group them together. For example, I might set min_support
to 10, and out of 12 support reads identified, only 4 carry a big allele. So if I set min_cluster_size
to 2, the 4 reads will still be clustered together.
Yes, this is intended for genome-scan mode, scenarios may be different in genotyping mode, as in your case.
I will probably remove this requirement, and add a checking to make sure cluster_size
cannot be larger than the actual number of support reads instead.
I will think about it more and update you on any change I decide to make shortly.
OK, I just enabled the min_cluster_size
to be larger than min_support
for genotyping mode in v1.2.0.
I won't allow this for genome scan as this will potentially remove many loci from being detected.
Thanks for raising the issue.
Thanks for your considerate explanations!
I will try the newest Straglr
for my data. If there is any question I will report it.
Thanks again for you modifications to Straglr
.
Hi,
Recently I have used Straglr to genotype a STR of my data. And I found a quite strange phenomenon. With the default parameters, I got these copy numbers of TR of several reads:
and the copy numbers of two clusters(alleles) are 894.8(the first two reads) and 169.5(other reads). It seems reasonable.
But when I turned
min_cluster_size
to 3 or 4, the results are same as that of 2! Which means even ifmin_cluster_size
is bigger than 2, the first two reads are still clustered together. That's strange.After some simple inspections of
TREFinder
class, I found thatmin_cluster_size
is affected bymin_support
, shown in the code below:I don't know why min_cluster_size cannot be larger than min_support. Since it is useful in genome-scan mode, what does it mean in genotyping mode?
Thank you for taking the time. Looking forward to your reply ! Jesson-mark