SunPengChuan / wgdi

WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes
https://wgdi.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
114 stars 22 forks source link

Strange collinearity results compared to MCscanX. #37

Closed Wenwen012345 closed 1 year ago

Wenwen012345 commented 1 year ago

Dear @SunPengChuan

Recently I've been using WGDI software. But something is confusing me, mainly about the colinearity (-icl).

The number of colinearity blocks found by the MCscanX I'm using is 583, and the number of colinearity blocks found by using WGDI is 1642 (parameters below). This makes me puzzled. Then I checked the colinearity block file generated by WGDI. Then I found some possible problems and would like you to comment on them?

[collinearity]
gff1 = Rb.gff
gff2 = Rb.gff
lens1 = Rb1.len
lens2 = Rb1.len
blast = Rb.blast
blast_reverse = false
multiple  = 1
process = 30
evalue = 1e-10
score = 100
grading = 50,40,25
mg = 40,40
pvalue = 0.2
repeat_number = 20
positon = order
savefile = Rb.wgdi.collinearity1

For example, some of the colinearity blocks seem to be false positives. Some of the colinearity blocks are simply genes in the opposite order and written there;

# Alignment 1638: score=191 pvalue=0.0283 N=5 9&9 minus
Rb.9.5708 3051 Rb.9.5714 3056 1
Rb.9.5709 3052 Rb.9.5713 3055 1
Rb.9.5710 3053 Rb.9.5711 3054 1
Rb.9.5711 3054 Rb.9.5709 3052 1
Rb.9.5713 3055 Rb.9.5708 3051 1

Then there are colinearity that seem to simply "slide" a gene, as follows.

# Alignment 1589: score=438 pvalue=0.0841 N=11 9&9 plus
Rb.9.1324 888 Rb.9.1323 887 1
Rb.9.1326 889 Rb.9.1327 890 1
Rb.9.1328 891 Rb.9.1330 893 1
Rb.9.1331 894 Rb.9.1333 895 1
Rb.9.1346 904 Rb.9.1337 897 -1
Rb.9.1348 905 Rb.9.1349 906 1
Rb.9.1349 906 Rb.9.1352 909 1
Rb.9.1353 910 Rb.9.1362 913 1
Rb.9.1381 927 Rb.9.1382 928 1
Rb.9.1389 932 Rb.9.1390 933 1
Rb.9.1390 933 Rb.9.1392 934 1

But these problems do not seem to exist in the MCscanX results. These may be the reason why there are more blocks in the "WGDI -icl" results than MCscanX results.

Best Regards! Sincerely, Wen

Attached: Rb.collinearity.txt MCscanX reresults Rb.wgdi.collinearity.txt Rb2.conf.txt

SunPengChuan commented 1 year ago

I'm pleased you have time to test the collinearity detection from WGDI. My approach was to first extract all synteny blocks that appear collinear in the homologous dotplot generated with the '-d' parameter. Then I aimed to filter out unwanted synteny blocks by running WGDI again with the '-c' parameter. The two scenarios you mentioned arise from tandem that occur within the same genomes. These situations would not emerge in interspecific comparisons across different genomes. The '-c' parameter in WGDI can be used with 'tandem = false' to remove these synteny blocks caused by tandem. The '-bk' parameter allows you to inspect the filtered blocks that remain afterward.

Wenwen012345 commented 1 year ago

OK. I will consider! Thanks!