GoliczGenomeLab / haploRILs

Founder haplotype reconstruction with SNP data
GNU General Public License v3.0
1 stars 0 forks source link

Best haploRILs parameters #1

Open GoliczGenomeLab opened 1 month ago

GoliczGenomeLab commented 1 month ago
          Hi @jamonterotena,

Thanks for the updated, that looks great. I already tried out haploRILs this morning. The functions works for my data. (but I have to modify the path for the haploRILs_function.R)

I would like to ask for your opinion about the parameter setting {nSnp} {step} {K}. I have 2.7M SNP in total, but I guess it would be better to use the subset ~50K SNP. And compare the stability of the results. Thus, what values would you recommend to use with ~50K SNP for 10 chromosomes of maize dataset?

P.S. I got a lot of warning messages when running the code, maybe you can check for it. "summarise() has grouped output by 'id', 'nSnp', 'K', 'blocksFiltered'. You can override using the .groups argument."

Best regards, Yan-Cheng

Originally posted by @yan-cheng-lin in https://github.com/GoliczGenomeLab/haploMAGIC/issues/2#issuecomment-2422440990

Thus, what values would you recommend to use with ~50K SNP for 10 chromosomes of maize dataset?

Hard to say. It depends on the resolution you aim to obtain, the marker size of your data and the genotyping error rates you expect in your data. I run a simulation-based benchmarking analysis on haploRILs that suggested that combinations of small window sizes, using low nSnp, with higher filtering controlled by K produces the best performance, especially with genotyping errors.

The functions works for my data. (but I have to modify the path for the haploRILs_function.R)

Thanks for reporting the bug!

P.S. I got a lot of warning messages when running the code, maybe you can check for it. "summarise() has grouped output by 'id', 'nSnp', 'K', 'blocksFiltered'. You can override using the .groups argument."

Thanks, I'm aware. dplyr::sumarise prints that annoying warning. It's possible to deactivate it but I found out that it's less risky to let it happen. I will fix it at some point.

yan-cheng-lin commented 1 month ago

Hi @GoliczGenomeLab ,

Thanks for the information "small window sizes, using low nSnp, with higher filtering controlled by K produces the best performance", I will try out with this principle.

Best regards, YCL