Zilong-Li / vcfppR

The fastest VCF/BCF parser in R https://doi.org/10.1093/bioinformatics/btae049
https://zilong-li.github.io/vcfppR/
Other
13 stars 3 forks source link

What is kind of evaluate matric of Peason R-square and NRC? #6

Closed Truongphikt closed 1 month ago

Truongphikt commented 3 months ago

Hi, vcfppR team,

I'm very grateful for the attempt to create a fast and convenient VCF evaluation tool. I had glimpsed through the article and documentation but still didn't know what kind of matric of Peason R-square was used. To my knowledge, usually, there are 2 kinds of approaches to calculating $r^2$:

Does the vcfppR tool use either the above method or another one? I so desire to know. Thanks.

Zilong-Li commented 3 months ago

Both are supported. Please refer to the doc of vcfcomp, particularly options by.sample and by.variant. Stay tuned. I will update the vignette and website at some point.

Truongphikt commented 3 months ago

@Zilong-Li Thanks for enthusiastic support. Could you please tell me what is default value of by.sample and by.variant parameters? This is what I guess what value should be set dependent on situation:

by.sample by.variant
Aggregated $r^2$ FALSE FALSE
SNP-wise $r^2$ FALSE TRUE
Aggregated NRC FALSE FALSE
Sample-wise NRC TRUE FALSE

Is that correct? I hope vcfppR becomes more widely used. Thanks.

Zilong-Li commented 3 months ago

You can find out the default arguments of a function in R by args(vcfcomp). Also, here is the online docs https://zilong-li.github.io/vcfppR/reference/vcfcomp.html. Sorry, I am on vocation and can not update the docs and more details, but will do it after vocation. If you wish it being more popular, help spread it out by giving stars and forks of the repo. Thanks.

Zilong-Li commented 1 month ago

Hey, sorry for late reply.

Your summarization on the vcfcomp is exactly correct. The default (by.sample=FALSE and by.variant=FALSE) will just aggregate everything for all samples in a bin of variants. If by.sample is TRUE, then calculate sample-wise statistics regardless the value of by.sample. If by.sample=FALSE and by.variant=TRUE and the number of samples is greater than 1, then calculate the SNP-wise statistics.

Let me know if this helps.

Best, Zilong