alachins / raisd

RAiSD: software to detect positive selection based on multiple signatures of a selective sweep and SNP vectors
33 stars 13 forks source link

can't find known selective sweeps #40

Open jdaron opened 2 years ago

jdaron commented 2 years ago

Dear

I have been using recently raisd in addition to other statistic descriptive to look at pattern of positive selection in one of my population.

Using pi, tajima's D and H12 I could strong signal of selection near previously identified insecticide resistance genes such as cyp6p, Vgsc, Gaba and Gste. However, when I used stats computed by raisd, patterns of selection are not that clear anymore.

I was wondering if such difference could be due to a poor usage of the tools. As input file I used a phased and polarized VCF file using the following command line: RAiSD -n mypop -I myVCF.vcf -M 0 -y 2 -f -D -R -S mypop.ids.lst -w 1000

Thanks for your answer, Josquin

LBVwil 1pop stats 10kbw LBVwil raisd 100

alachins commented 2 years ago

One possible reason is that you are using a very wide window (1000). You can try values in the range 12 to 50 for the window width (-w).

On Thu, Aug 11, 2022 at 3:38 PM jdaron @.***> wrote:

Dear

I have been using recently raisd in addition to other statistic descriptive to look at pattern of positive selection in one of my population.

Using pi, tajima's D and H12 I could strong signal of selection near previously identified insecticide resistance genes such as cyp6p, Vgsc, Gaba and Gste. However, when I used stats computed by raisd, patterns of selection are not that clear anymore.

I was wondering if such difference could be due to a poor usage of the tools. As input file I used a phased and polarized VCF file using the following command line: RAiSD -n mypop -I myVCF.vcf -M 0 -y 2 -f -D -R -S mypop.ids.lst -w 1000

Thanks for your answer, Josquin

[image: LBVwil 1pop stats 10kbw] https://user-images.githubusercontent.com/9592106/184137397-7480df1d-d2e6-4ccb-9879-2068856c7d7c.png [image: LBVwil raisd 100] https://user-images.githubusercontent.com/9592106/184146390-3a816493-d2b8-436c-a78e-94e90ab0ed04.png

— Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/40, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCSGWS5QVDXB5BFDVUDVYT66ZANCNFSM56IGUZ7A . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Nikolaos Alachiotis

jdaron commented 2 years ago

Hi Nikos, Thanks for your quick answer, here is another look at the plot using -w 50. As you can see, reducing the window size improved the detection of the sweep at the Gste and a bit Gaba loci. However I couldn't spot a signal at the cyp6p locus, which is potentially a soft sweep.

On aspect of the plot bellow that makes me a bit worry is that the size of the pic for the mu stat at the Gste and Gaba loc, which is a bit little, especially compared to the pic observed with the H12 stat, which is also confirmed using xpehh stat.

Do you know if I could make other adjustment to improve the performance of the program. For example I was wondering whether having heterogeneous SNP density would affect the performance of the program. Thanks, Josquin

LBVwil raisd 50

alachins commented 1 year ago

Hi Josquin, RAiSD is designed to detect hard sweeps, so not getting any signal in a region where a soft sweep potentially is is expected. As far as I can tell from the plots, the sfs signal is weaker than the other two at the Gaba loc and maybe stronger at the Gste loc. Considering that the sfs-stat in RAiSD looks at the singletons and N-1 class (N is the sample size) only, you could either change the SNP classes included in the calculation (-c) or assign a lower/higher weight to sfs per chromosome (supported in version 3.1: https://github.com/pephco/raisd, see the help menu options -VAREXP , -SFSEXP , -LDEXP ) . Best regards, Nikos A.

On Wed, Aug 17, 2022 at 4:14 PM jdaron @.***> wrote:

Hi Nikos, Thanks for your quick answer, here is another look at the plot using -w 50. As you can see, reducing the window size improved the detection of the sweep at the Gste and a bit Gaba loci. However I couldn't spot a signal at the cyp6p locus, which is potentially a soft sweep.

On aspect of the plot bellow that makes me a bit worry is that the size of the pic for the mu stat at the Gste and Gaba loc, which is a bit little, especially compared to the pic observed with the H12 stat, which is also confirmed using xpehh stat.

Do you know if I could make other adjustment to improve the performance of the program. For example I was wondering whether having heterogeneous SNP density would affect the performance of the program. Thanks, Josquin

[image: LBVwil raisd 50] https://user-images.githubusercontent.com/9592106/185153703-442bfe50-caf0-4771-b664-4e591cc24237.png

— Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/40#issuecomment-1218069453, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCSRS3HONJ7PPUZV5E3VZTXVXANCNFSM56IGUZ7A . You are receiving this because you commented.Message ID: @.***>

-- Nikolaos Alachiotis