dkoboldt / varscan

Variant calling and somatic mutation/CNV detection for next-generation sequencing data
152 stars 34 forks source link

Infering population parameters from pooled samples #19

Open situssog opened 7 years ago

situssog commented 7 years ago

Hello, I have HiSeq data from some yeast experimental populations and I would like to compare population parameters between populations (samples). For example, I would like to estimate genetic diversity, or be able to biuld a side freqeuncy spectrum. Maybe using something like Lynch et al. analyses: https://doi.org/10.1093/gbe/evu085 or Ferretti et al work: http://onlinelibrary.wiley.com/doi/10.1111/mec.12522/abstract

However, I am wondering how to normilise the samples to be able to compare between them (for instance diversity). I'm especially consern about differences in coverage between samples. I see, with VarScan I can set min number of supporting reads, freqeuncy, quality, etc, but the presition to detect for instance singlotons (or in general low freq variants) will depend on coverage. Could someone give me some thints of that? Thanks a lot in advance,

/Sergio Tusso

cfljam commented 7 years ago

I can recommend Ferettis Npstat

cfljam commented 7 years ago

I havent been able to find any sign of code that might facilitate doing this directly on VCF files. In terms of SFS check out https://github.com/magicDGS/PoolHMM

Popoolation2 has excellent documentation and there are some projects to address the file clutter it creates eg https://github.com/magicDGS/PoPoolation2_magicDGS