SunPengChuan / wgdi

WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes
https://wgdi.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
121 stars 22 forks source link

Strange Ks distribution analysis results #36

Closed Wenwen012345 closed 1 year ago

Wenwen012345 commented 1 year ago

Dear @SunPengChuan

The species I studied was the rhododendron (Rhododendron bailiense). I am a user of your software for research purposes and have recently come across a query when analysing the distribution of Ks (Figure below) values in my species and would appreciate your explanation and guidance.

image

I would like to ask you if you have any explanation or advice for this high number of small Ks value distributions that do not seem to have similar distributions in species of the same genus(e.g., R. simsii, figure below)? This is confusing to me as I would expect that there should be some similar pattern of Ks value distributions in species from the same genus. Is this due to the possibility of a large number of recent tandem duplication in my species?

image

The Ks model I use is YN80.

I use the following process reference: https://blog.csdn.net/u012110870/article/details/115511709?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168735417916800222864610%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=168735417916800222864610&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-6-115511709-null-null.268^v1^koosearch&utm_term=%E5%A6%82%E4%BD%95%E7%94%A8WGDI%E8%BF%9B%E8%A1%8C%E5%85%B1%E7%BA%BF%E6%80%A7%E5%88%86%E6%9E%90&spm=1018.2226.3001.4450

confs: Rb1.conf.txt peak1.conf.txt peak2.conf.txt

ks.csv: all_ks.csv

block information: Rb.block.information4YN80.csv

.ks: Rb.ks3.txt

Thank you very much for your valuable time and expertise, and I look forward to your reply and guidance. Best regards!

SunPengChuan commented 1 year ago

Apologies for the delayed response. Upon reviewing the data, your hypothesis was indeed accurate. The preceding peak was attributable to the tandem repeats. Regarding the missing peak at 0.5, I am uncertain how the blockinfo.csv file that you sent me was filtered.

I have a minor suggestion. [collinearity] mg = 40,40 repeat_number = 20

Wenwen012345 commented 1 year ago

Thank you so much for your reply. @SunPengChuan My settings for the collinearity filtering parameters in running WGDI can be found in the next reply. Maybe yours is better then the one I should make an attempt at later.

And then there's something about the peak at 0.5 that you mentioned, I'm not quite sure if it is reflected in the previous step (wgdi -kp Rb1.conf)? Here are the figure (using the YN80 model).

image

!!! Because I only "fitted two peaks" in "Step 3" (in the order of reference); I am not sure if this is why the peak at "0.5" is missing. Perhaps I should have "fitted three peaks".

I should also point out that I made a mistake in the previous "peek1.conf and peek2.conf" file settings in that the "area" settings were not continuous ("0,0.5" and "1.25,2.5"). Changing them to continuous ("0,0.5" and "0.5,3.5") gives the following figure.

image

In my next reply, I will include a reference (in Chinese) to the complete process I used with WGDI, in three major steps.

@SunPengChuan

Wenwen012345 commented 1 year ago

Appendix: The procedure I referenced for using WGDI.

How to perform collinearity analysis with WGDI (I) (如何用WGDI进行共线性分析(一)) https://blog.csdn.net/u012110870/article/details/115511706?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168724294516800188547211%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=168724294516800188547211&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-3-115511706-null-null.268^v1^koosearch&utm_term=WGDI&spm=1018.2226.3001.4450

How to perform collinearity analysis with WGDI (II) (如何用WGDI进行共线性分析(二)) https://blog.csdn.net/u012110870/article/details/115511708

How to perform collinearity analysis with WGDI (III) (如何用WGDI进行共线性分析(三)) https://blog.csdn.net/u012110870/article/details/115511709?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168735417916800222864610%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=168735417916800222864610&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-6-115511709-null-null.268^v1^koosearch&utm_term=%E5%A6%82%E4%BD%95%E7%94%A8WGDI%E8%BF%9B%E8%A1%8C%E5%85%B1%E7%BA%BF%E6%80%A7%E5%88%86%E6%9E%90&spm=1018.2226.3001.4450

SunPengChuan commented 1 year ago

You need to analyze the results with the '-bk'. I still don't understand why you use ("0,0.5" and "1.25,2.5")?

When clear peaks are observed, peak values can be extracted by dividing the data into segments. However, this is done after adjusting with homo+multiple.

here are two different parameters here to help extract different peaks, which need to be adjusted based on the results from '-d'. [kspeaks] pvalue = 0.2 tandem = False ks_area = 0,10 multiple = 1 homo = 0.3,1

[kspeaks] pvalue = 0.2 tandem = False ks_area = 0,10 multiple = 1 homo = -1,0.3

Wenwen012345 commented 1 year ago

Thank you so much for your prompt and thoughtful response. I will try it. By the way, the reason for setting the "area" to "0,05" and "1.25,2.5" is that I am new (just one day) to the software and not familiar with it. I think there are two peaks at 0-0.5 and 1.25-2.5 according to the "reference" ("How to perform collinearity analysis with WGDI (III) ", I set the two parameters like the figure below from the reference case), so that's how I set it. Anyway, I will try your suggestion carefully. I really appreciate the time and effort you put into providing me with valuable insights and suggestions.

屏幕截图 2023-06-24 015755

SunPengChuan commented 1 year ago

There is an error with the modification of your parameters. You should modify "area" instead of "ks_area".

Wenwen012345 commented 1 year ago

There is an error with the modification of your parameters. You should modify "area" instead of "ks_area".

Okay. Thank you very much for your guidance and reply a few days later. My previous process was entirely based on XuZhouGeng's "How to perform collinearity analysis with WGDI (III)". He directly modified the "ks_area" parameter to get the fit parameters for two peaks. I will use your recommended parameters next, thanks!

I have just watched your video on bilibili and probably have an idea of the overall process. By the way, I would really suggest you upload a louder video, I almost had to put my ear to the stereo to hear it clearly (my stereo is linked to the graphics card), sorry!

Wenwen012345 commented 1 year ago

Hello @SunPengChuan

This is a previous issue, but I didn't really have that issue fully resolved before due to some other things.

In my species, I optimised the annotation file; and finally realigned it to get the following picture of kspeaks (-kp) (which belongs to the paralogous duplicate comparison within the species). Now my question is, if I just want to get the ks values of the two peaks (to detect when the WGD occurs); can I just fit the two peaks by adjusting the "ks_area" parameter and get the ks values at the peaks? Just like the method described above. Of course I watched your video and adjusting homo, mutiple to filter blocks is one way (I tried it once before but it may not work too well in my species). But I'd like to get an easier idea of the ks values of the two peaks, wouldn't it be better to fit the two peaks directly with "ks_area"?

image

SunPengChuan commented 1 year ago

You can do it that way, and the result may not deviate too much. However, for some special cases, this approach would not be suitable. For example, in Spirodela where two WGDs occurred consecutively and were very close in timing. Directly fitting it would give only one Ks peak.

Wenwen012345 commented 1 year ago

Okay. Thanks for explanation! The network lag caused my thanks delay.