arzwa / wgd

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.
http://wgd.readthedocs.io/en/latest/
GNU General Public License v3.0
80 stars 40 forks source link

Dev #61

Open lizhencmb opened 3 years ago

lizhencmb commented 3 years ago

Hi Arthur,

I am going through the wgd code as a way to learn a bit more of python. It is fun actually :-) I have made some changes to let V2 produce similar output files as V1. I've seen that you've put weighting stuff in the visualization part, but I think it would still make sense to include them in the ksd Ks table, so that others can draw the distributions by their own (if they want to).

You can see I've also added a function to strip the alignment with a parameter to leave some gaps. I was thinking that codeml can deal with some gaps in its pairwise mode with cleandata=0. However, after some tests, it seems not really the case, so the function is currently only used to remove all the gaps.

Best, Zhen

arzwa commented 3 years ago

Thanks Zhen, nice to see someone helping out! Concerning the cleandata thing in PAML, see this reported bug. I had changed some output file formats indeed, but I agree it may be better to keep them compatible with earlier versions.

It seems the tests are failing because

  1. the alignment length has changed due to gap trimming you introduced, so I think we should just update the tests to test with/without trimming.
  2. something related to your last commit concerning the diamond output, which I don't see immediately.

So if we update the tests, I can merge this in.

lizhencmb commented 3 years ago

Hi Arthur, the failed tests were due to replacing gene ids in multi-species diamond search and alignment trimming. I see that that we trim sequence alignments twice (a bit redundant) and the tests only considers the first one. I did not change the tests for now, but will modify it a little bit later, e.g. maybe for gene tree inference we can tolerate some gaps.

For the commit about diamond, I just added a diamond output file in the output folder (in wgd_dmd by default).