arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
81 stars 40 forks source link

raise the `max_pairwise` parameter #68

Closed ardy20 closed 2 years ago

ardy20 commented 2 years ago

Dear Developer

Could you please explain how to raise the max_pairwise parameter?

I got this message and I would like to know about it, if it is necessary:

There were 1949 warnings during translation 2021-09-30 11:26:35: INFO Started whole paranome Ks analysis 2021-09-30 11:26:35: WARNING Filtered out the 15 largest gene families because n*(n-1)/2 > max_pairwise 2021-09-30 11:26:35: WARNING If you want to analyse these large families anyhow, please raise the max_pairwise parameter.

Also, Is there any command that can be used to plot the whole genome duplication events?

Regards

arzwa commented 2 years ago

Please check the help message wgd ksd -h. You will find there

[...]
  -mp, --max_pairwise INTEGER     maximum number of pairwise combinations a
                                  family may have  [default: 10000]

So you just raise it with -mp 20000 for instance. But I would not do that. The largest families (>100 genes) will take a lot of time and are usually not informative for WGDs (usually these are TEs or so, but you can check that).

To plot, you get the default plots from a wgd ksd run, and you can use wgd viz. Alternatively you can work with the ksd output file in your preferred computing environment, like R (see e.g. #24) or python or so.