Closed twooldridge closed 4 years ago
Dear Brock, The two programs, RAiSD and SweeD treat windows in a different way.
SweeD: The reported window is determined dynamically while SweeD runs. In detail, the window that maximizes CLR is reported. Assume that you want to calculate CLR at a position X. SweeD starts from the smallest possible window around X and expands it till a maximum threshold which has the value (alpha d) = 12. alpha d is a scaled distance (d in base pairs) with the parameter alpha. The smaller alpha the stronger the selection coefficient is. Thus, in short, for very strong selection, the window can become very large, which is biologically correct: very strong selection affects very large genomic regions. This is exactly what I see in your case. For chr3, the alpha value is small and the window large.
RAiSD: The window size in RAiSD is constant in terms of SNP number. Thus, if a region is sparse (which is a signature of selection), then the constant number of SNPs will occupy a large genomic region in terms of bp.
When you use either tool (or other tools that search for selection), you should be careful with "weird" regions, i.e. centromers. We suggest that these regions should be masked later on, when you finish the analysis. What I'm doing is to exclude centromers as well as a small region around them.
Please contact me if you have further questions pavlos & nikos
Hi Pavlos & Nikos,
Thank you for the very detailed response, I see the distinction between RAiSD and SweeD now! One follow up question occurs to me: for the chr3 example I gave above, why would directly adjacent positions not also report a relatively high Likelihood and large-ish window? If selection was strong enough to reduce variation such that SweeD extends out ~75000bp to each side of position 84,262,618, presumably selection has acted over a large enough region that, when starting next door at 84,263,618, SweeD would similarly extend boundaries a good ways to meet (alpha*d)=12.
It is true that for some genomic regions in my data with very high Likelihood values, I do see a gradual "peak" in the Likelihood landscape rather than a single point. This is typically over ~MB sized regions. Perhaps, the effect just needs to be very strong for this to happen? I should also qualify that LD in my populations is VERY low, so it would not be unexpected for signals to disappear quickly at nearby sites.
Hopefully this makes sense, and thanks again for your help.
Best, Brock
Hi Brock,
"Thank you for the very detailed response, I see the distinction between RAiSD and SweeD now! One follow up question occurs to me: for the chr3 example I gave above, why would directly adjacent positions not also report a relatively high Likelihood and large-ish window? If selection was strong enough to reduce variation such that SweeD extends out ~75000bp to each side of position 84,262,618, presumably selection has acted over a large enough region that, when starting next door at 84,263,618, SweeD would similarly extend boundaries a good ways to meet (alpha*d)=12."
Not necessarily because the distance between the hypothetical sweep position and the SNP plays an important role. So for example, it is possible that there are some doupletons near 84,263,618 (or one tripleton etc) and many n-1's near 84,262,618. The likelihood for the 84,262,618 position is high, however the likelihood for the 84,263,618 position is low.
Τhe reported positions are a bit strange. We will check it, if there is some bug there.
best pavlos
Dear Brock, could you please send me the dataset that you generated these results? We'd like to check it usng some printouts.
best pavlos
Hi Pavlos,
I've attached the gzipped Sweepfinder-formatted input file for chr3 and that population. You'll see that I keep monomorphic sites in the file if the ancestral state is unknown, per recommendations for using SweepFinder2 (which I've also played around with). I exclude these anyway when I run SweeD, using the default parameters. Please let me know if the file doesn't work, or if I can provide more information. Thank you!
Best, Brock
Hi Brock, Can you also provide the command line you used? Best regards, Nikos
Hi Nikos,
Of course, I should have included that earlier. The command was:
/n/home11/twooldridge/Software/sweed/SweeD-P -threads 16 -grid 161151 -input sweepfinder_files/NUB.chr3.SF -name NUB.chr3.SweeD
With the grid size chosen to approximate 1kb windows.
Thanks, Brock
On May 2, 2019, at 3:28 AM, alachins notifications@github.com wrote:
Hi Brock, Can you also provide the command line you used? Best regards, Nikos
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/7#issuecomment-488576359, or mute the thread https://github.com/notifications/unsubscribe-auth/AEACLKCFZYEGHVRLR24LWKDPTKJYRANCNFSM4HELUSTA.
Hello,
I was playing around with some results from both RAiSD and SweeD, and I became confused by the start and end positions output by both programs. How exactly are these points determined? It appears to me that the size of the window can seriously influence the computed statistics, for example having a large window with few SNPs will raise VAR in RAiSD (pointed out by me here https://github.com/alachins/raisd/issues/6). Looking at this extreme example from a SweeD run, you can see a major change in the window size (indicated by the final column)
And this data has already been masked for low coverage, low mappability, and generally low-quality regions that I might expect to produce this sort of result. I see similar effects with RAiSD, again following masking. Because of this I would like to understand how these boundaries are determined, but looking through the publications and source code for both programs I'm having trouble identifying where this is discussed. Please excuse me if I'm missing something obvious here.
As a side note, my input files for SweeD are in the SweepFinder format, and I exclude monomorphic sites. For RAiSD, my input file is simply a VCF.
Thank you for your help, and please let me know if I can provide any more information.
Best, Brock