arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
81 stars 41 forks source link

Ks distribution construction error #5

Closed yuntwang closed 5 years ago

yuntwang commented 5 years ago

Hi, I use the wgd mcl sub command get the GENE_FAMILIES file and then ksd analysis like this: ./software/wgd_venv/bin/wgd ksd --n_threads 2 --wm phyml -p ALATA.PEP.fa ALATA.PEP.fa.blast.tsv.mcl and get error:

2018-11-29 15:59:57: INFO 2018-11-29 15:59:57: INFO codeml found 2018-11-29 15:59:57: INFO MUSCLE v3.8.31 by Robert C. Edgar 2018-11-29 15:59:57: INFO . command-line: phyml --version . This is PhyML version 20160207. 2018-11-29 15:59:57: ERROR No gene families or no sequences provided.

why?

arzwa commented 5 years ago

Hi,

You need CDS (coding DNA sequences) to estimate Ks values, since you can not translate protein sequences back to their codons unambiguously. The -p option is there to provide a custom translation (e.g. when you're organism does not have the standard genetic code). The wgd ksd command always needs to be run as wgd ksd [-options] GENE_FAMILIES CDS_SEQUENCES. So please use the wgd commands with CDS sequences. Either redo the analysis with CDS fasta files, or if you have a CDS fasta file with the same gene names as in ALATA.PEP.fa, say ALATA.cds.fa you can do:

./software/wgd_venv/bin/wgd ksd --n_threads 2 --wm phyml  ALATA.PEP.fa.blast.tsv.mcl  ALATA.cds.fa

Hope this helps,

Best Arthur

yuntwang commented 5 years ago

thank you very much