Open GoliczGenomeLab opened 1 month ago
Hi @GoliczGenomeLab ,
Thanks for the information "small window sizes, using low nSnp, with higher filtering controlled by K produces the best performance", I will try out with this principle.
Best regards, YCL
Thanks for the updated, that looks great. I already tried out haploRILs this morning. The functions works for my data. (but I have to modify the path for the haploRILs_function.R)
I would like to ask for your opinion about the parameter setting {nSnp} {step} {K}. I have 2.7M SNP in total, but I guess it would be better to use the subset ~50K SNP. And compare the stability of the results. Thus, what values would you recommend to use with ~50K SNP for 10 chromosomes of maize dataset?
P.S. I got a lot of warning messages when running the code, maybe you can check for it. "
summarise()
has grouped output by 'id', 'nSnp', 'K', 'blocksFiltered'. You can override using the.groups
argument."Best regards, Yan-Cheng
Originally posted by @yan-cheng-lin in https://github.com/GoliczGenomeLab/haploMAGIC/issues/2#issuecomment-2422440990
Hard to say. It depends on the resolution you aim to obtain, the marker size of your data and the genotyping error rates you expect in your data. I run a simulation-based benchmarking analysis on haploRILs that suggested that combinations of small window sizes, using low nSnp, with higher filtering controlled by K produces the best performance, especially with genotyping errors.
Thanks for reporting the bug!
Thanks, I'm aware. dplyr::sumarise prints that annoying warning. It's possible to deactivate it but I found out that it's less risky to let it happen. I will fix it at some point.