alachins / raisd

RAiSD: software to detect positive selection based on multiple signatures of a selective sweep and SNP vectors
33 stars 13 forks source link

RAiSD with SNP array data #23

Open vlattko opened 3 years ago

vlattko commented 3 years ago

Hi! I was running RAiSD analysis with maize 600k SNP array (572 lines) data prunned using a 50 SNP window, 5 bp step size and VIF = 2 in Plink 1.9. SNPs were prunned according to the results of Malomane et al., to mitigate the effects of ascertainment bias in array data. However, my RAiSD results appear really different from what was reported and the values of mu are very low. There are only two chromosomes with mu higher than 0.1

RAiSD_Plot.Prunned60kSEE_2.5.pdf RAiSD_Plot.Prunned60kSEE_2.6.pdf

Can you please comment on these results and the applicability of RAiSD with SNP array data?

I would really like to report these results, because the signals were detected in very interesting regions.

p.s. When I add argument -k 0.05, only a single signal crosses the threshold. All reported signals cross the threshold only when I apply the less stringent cutoff of 0.1.

alachins commented 3 years ago

It seems there is some signal there but the applicability of RAiSD on SNP array data actually depends on your data. SweeD and OmegaPlus are more suitable for processing SNP array data. Could you try to use them as well to see if the results agree?

vlattko commented 3 years ago

Thanks for your reply! So, I ran SweeD with the same data set, with grid size of 100,000 and found a considerable number of additional hits.

These are the alpha values for chromosome 6: image

I defined a cutoff at 99.95%, and the putative sweep on chromosome 6 from my last post is still present (significant), but with alpha value of 0.705

I am thinking of defining a more stringent threshold, but would appreciate your advice...

Thanks!

vlattko commented 3 years ago

After the inspection of all chromosomes, with grid size 10,000, it turns out that there are some positions with CLR fixed at 1200. I can't find in your paper why this happens? Are these valid signals or possible outliers?

CLR.grid.10k.pdf

The 99.99% cutoff value is CLR = 4

alachins commented 3 years ago

Can you check whether there are SNPs in the regions where the CLR scores are fixed at 1200? We can look into why this is happening if you send the report and the parts of your vcf that corresponds to those scores.

vlattko commented 3 years ago

Sorry, I accidentally plotted alpha values instead of CLR... CLR scores seem fine to me now. Interestingly, three of four sweep signals fromRAiSD analyis are also significant in SweeD along with 12 new signals.

CLR_10kGrid

Thanks for your help!

alachins commented 3 years ago

That's good news. It might also be useful to run the common-outlier analysis of RAiSD: it reports and plots common outliers between RAiSD and SweeD to help you identify significant locations.

https://github.com/alachins/raisd#common-outliers-between-raisd-and-sweedomegaplus