Open DuttaAnik opened 2 years ago
In SweeD, the grid parameter defines the number of evaluation points, not evaluation windows. So "The contig length is 78070536 and I used the grid size of 7807" will give you one evaluation point every 10kb. SweeD is performing a MLE. The report file includes the window size for each evaluation point.
In RAiSD, the window width is not kb but the number of SNPs. To perform the comparison you want, both tools need to be executed with the same grid value to generate compatible reports. You can provide one large VCF with many chromosomes. Each chromosome will be processed independently of the others.
On Fri, Jul 1, 2022 at 1:44 PM Anik Dutta @.***> wrote:
Hi, I am currently doing a selection scan using SweeD and RAiSD. I would like to merge both outputs to find overlapping regions under selection. But I am having a confusion with the window-based output. For example, I ran SweeD using the following code: SweeD -input Scaffold_1__2_contigs__length_78070536.vcf -grid 7807 -folded -noSeparator -name 1963_Chr_1 -sampleList 1963_pop.txt The contig length is 78070536 and I used the grid size of 7807 to obtain a window of 10 kb.
However, if I want to compare the result with RAiSD, then do I also need to set a window size of 10 kb using -w 10 in RAiSD? Because the default is 50 kb if I am not mistaken.
Also, is it recommended to use individual VCF for each chromosome as an input in RAiSD or all the chromosomes in one VCF for a single population would be suitable? Because I am working with a non-model organism whose reference genome is not at a chromosome scale yet. It has ~100 scaffolds.
Thank you.
— Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/37, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCXVZAK7HDDG2NBHKJ3VR3DY7ANCNFSM52MJYIRA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Nikolaos Alachiotis
Thank you for the reply. But there is a problem. I am using the SweeD version 3.3.2, not the latest one where you get the start and stop position. The reason for not using the latest version is that I do not get any likelihood values in the output. With the same command using SweeD 3.3.2, I get results with different likelihood values. But with the latest SweeD version, all the likelihood values are Zero. Do not know what is causing this.
But coming back to the point, for example, I have an output like this from SweeD:
Position Likelihood Alpha
53744.0000 2.599466e-01 1.939265e+01
63731.3682 6.088593e-01 7.807052e-05
73718.7363 1.341805e+00 8.917006e-05
83706.1045 3.494308e-01 5.031090e-03
93693.4727 3.847188e+00 1.736058e-04
103680.8408 4.946577e-08 3.076922e-01
113668.2090 9.224968e-08 8.163262e-02
123655.5771 4.783268e-08 3.000000e+00
Here, you said that the report file includes the window size for each evaluation point which is I guess the "Position"? But how can I obtain the window size from such output? Should I add 5 kb up and down from each of the values under the "Position" column, which is above the significance threshold?
I think that adding 5kb on each side will not work. This way you will either overestimate (in the case of small grid sizes) or underestimate (in the case of large grid sizes) the window per evaluation position. You can obtain a much better estimation of the window per position by setting a very large grid size and then cluster together consecutive evaluation positions with the same scores. This should give you a much more accurate estimation of the window per position but it will take longer to execute and the clustering is not automatically done by SweeD.
On Sun, Jul 3, 2022 at 1:21 PM Anik Dutta @.***> wrote:
Thank you for the reply. But there is a problem. I am using the SweeD version 3.3.2, not the latest one where you get the start and stop position. The reason for not using the latest version is that I do not get any likelihood values in the output. With the same command using SweeD 3.3.2, I get results with different likelihood values. But with the latest SweeD version, all the likelihood values are Zero. Do not know what is causing this.
But coming back to the point, for example, I have an output like this from SweeD:
Position Likelihood Alpha 53744.0000 2.599466e-01 1.939265e+01 63731.3682 6.088593e-01 7.807052e-05 73718.7363 1.341805e+00 8.917006e-05 83706.1045 3.494308e-01 5.031090e-03 93693.4727 3.847188e+00 1.736058e-04 103680.8408 4.946577e-08 3.076922e-01 113668.2090 9.224968e-08 8.163262e-02 123655.5771 4.783268e-08 3.000000e+00
Here, you said that the report file includes the window size for each evaluation point which is I guess the "Position"? But how can I obtain the window size from such output? Should I add 5 kb up and down from each of the values under the "Position" column, which is above the significance threshold?
— Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/37#issuecomment-1173053969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCU2F26PI6OLIMISVKTVSFSSNANCNFSM52MJYIRA . You are receiving this because you commented.Message ID: @.***>
-- Nikolaos Alachiotis
Sorry, this is not a SweeD platform, but still, I want to further ask a question since I am new to this kind of analysis.
Ok, but then how can I conclude for example a region that is under selection? or what is the length of that selection sweep region from those points under the "Position" column? Because as you see those numbers are in Decimal and cannot be an SNP for sure. Let's say I have the following positions on a chromosome for one population which are above the cutoff threshold at 99.5%
Contig Position Likelihood Alpha
Chr3_1963 39556821.3389 906.004 9.723529e-06
Chr3_1963 39566819.457 1083.519 1.166873e-05
Chr3_1963 39576817.5752 1448.674 1.534434e-05
Chr3_1963 39586815.6934 1717.467 1.709699e-05
Chr3_1963 39596813.8115 1633.023 1.516612e-05
Chr3_1963 39606811.9297 1628.539 1.39571e-05
Chr3_1963 39616810.0479 1672.212 1.335288e-05
Chr3_1963 39626808.166 1687.752 1.244344e-05
Chr3_1963 39636806.2842 979.9007 1.267864e-05
Chr3_1963 39646804.4023 758.3056 1.301793e-05
Chr3_1963 39656802.5205 489.5013 1.211109e-05
And for another population, I have only one CLR value above the cutoff like:
Contig Position Likelihood Alpha
Chr3_1958 104724555.5322 328.3317 7.988635e-05
For the first case, should I round up the Position 39556821.3389
to a discrete number and then combine the subsequent positions until 39656802.5205
to define a region under selection?
If that is the case, then how can I determine the region of selective sweep for the second case where I have only one position?
It would really great if you can help me to define the selective sweep region here. Thanks.
Hi, I am currently doing a selection scan using SweeD and RAiSD. I would like to merge both outputs to find overlapping regions under selection. But I am having a confusion with the window-based output. For example, I ran SweeD using the following code:
SweeD -input Scaffold_1__2_contigs__length_78070536.vcf -grid 7807 -folded -noSeparator -name 1963_Chr_1 -sampleList 1963_pop.txt
The contig length is78070536
and I used the grid size of7807
to obtain a window of10 kb
.However, if I want to compare the result with RAiSD, then do I also need to set a window size of
10 kb
using-w 10
in RAiSD? Because the default is50 kb
if I am not mistaken.Also, is it recommended to use individual VCF for each chromosome as an input in RAiSD or all the chromosomes in one VCF for a single population would be suitable? Because I am working with a non-model organism whose reference genome is not at a chromosome scale yet. It has ~100 scaffolds.
Thank you.