Closed igigkuo closed 2 years ago
I had a look, and it seems that when using getSE=1
, sometimes codeml stops without crashing before all Ks values are estimated. It looks like this when you run codeml
directly:
$ codeml GF_000002.ctrl
CODONML in paml version 4.9h, March 2018
var(dS,dN) not calculable.
var(dS,dN) not calculable.
var(dS,dN) not calculable.
var(dS,dN) not calculable.
var(dS,dN) not calculable.
var(dS,dN) not calculable.
xmax = 0.0000e+00 close to zero at 1!
I modified the code to first check whether all pairwise estimates are in the codeml output file, raising a non-crashing error when this is not the case. I also implemented for your convenience a function to pass options for codeml from the command line in wgd ksd
, so no need to hack into the codeml.py
source code anymore. So now you can do:
wgd ksd <gene families> <CDS fasta> -tmp tmpdir --preserve --codeml_options getSE=1
Instead of breaking when codeml fails it should now look like this:
2020-05-13 09:46:39: INFO Started analysis in parallel (n_threads = 4)
2020-05-13 09:46:39: INFO Performing analysis on gene family GF_000001
2020-05-13 09:46:39: INFO Performing analysis on gene family GF_000002
2020-05-13 09:46:39: INFO Performing analysis on gene family GF_000003
2020-05-13 09:46:39: INFO Performing analysis on gene family GF_000004
2020-05-13 09:47:02: ERROR Not all gene pairs present in /home/arzwa/dev/wgd/_test/tmpdir3/GF_000001.codeml
2020-05-13 09:47:02: WARNING No codeml results for GF_000001!
2020-05-13 09:47:02: INFO Performing analysis on gene family GF_000005
I haven't implemented parsing out the standard errors from the codeml output (I'm working on a new version of wgd
and I'll try to remember implementing it there), so you'll still need to use --preserve
and obtain the SEs yourself after wgd ksd
.
Please install this updated version (latest commit on the master branch). This has not been thoroughly tested, so please let me know if you still encounter issues.
Hi,
Thanks for your quick response! It's great to have the --codeml_options to use, really appreciate for that! I have installed the new version and see the new --codeml_options, but there is another error occurred as below:
2020-05-14 02:36:36: INFO AAML in paml version 4.9, March 2015
2020-05-14 02:36:36: INFO codeml found
Traceback (most recent call last):
File "/home/kuokuo/anaconda3/envs/python36/bin/wgd", line 11, in <module>
load_entry_point('wgd==1.2', 'console_scripts', 'wgd')()
File "/home/kuokuo/anaconda3/envs/python36/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/kuokuo/anaconda3/envs/python36/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/kuokuo/anaconda3/envs/python36/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/kuokuo/anaconda3/envs/python36/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/kuokuo/anaconda3/envs/python36/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/kuokuo/anaconda3/envs/python36/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgd_cli.py", line 639, in ksd
min_msa_length=100, ignore_prefixes=False, one_v_one=False,
File "/home/kuokuo/anaconda3/envs/python36/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgd_cli.py", line 690, in ksd_
logging.error('No gene families or no sequences provided.')
File "/home/kuokuo/anaconda3/envs/python36/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgd/utils.py", line 75, in can_i_run_software
FileNotFoundError: [Errno 2] No such file or directory: 'codeml.ctl'
The same error occurred regardless of using this option or not. Thank you again for your kind help!
Sincerely,
Hi, I tried again with a fresh installation, and I cannot reproduce this error on python3.7. What is your exact command?
I run tests on both python3.7 and python3.6 (in different conda env) with the commands:
wgd ksd -n 10 <gene families> <CDS fasta> --preserve
wgd ksd -n 10 <gene families> <CDS fasta>
wgd ksd <gene families> <CDS fasta>
All show the same report. When I re-install the old version, it works as previous did.
Hi,
I tried to uninstall and reinstall wgd v1.2, but still got the same error. The good news is, I find out that when the gene families contains only two sequences, the Ks calculation by codeml went well, and the whole process with wgd v1.1 completed prefectly! Although I still don't know why wgd v1.2 doesn't work in my system, but at least the original idea is feasible.
Sincerely,
Hi, I don't understand. From the error message, the program seems to crash when trying to run codeml
but that part of the code is exactly the same in the two versions of wgd
. Is the codeml executable the same in both conda enviroments? It seems from the message above you are using an outdated PAML version BTW, I'm running PAML 4.9h.
Hi,
I think this is caused by my system environment, but I have no idea right now.
My system is CentOS Linux release 7.8.2003. I used base
environment and python36
environment for wgd analysis, and used python setup.py install
to install wgd. The conda list
for the two env are attached. I saw in the two env the paml version are both 4.9.
I noticed in wgd v1.2, it showed
2020-05-14 02:36:36: INFO AAML in paml version 4.9, March 2015
before
2020-05-14 02:36:36: INFO codeml found
while in wgd v1.1 only the second line was shown.
Is this normal in the new version?
I'll try some other test for the new version and will update here if I find out something new.
Sincerely,
I tried to run MCL clustring. I installed Paml package. but it still says "ERROR codeml executable not found!". can someone hel me please?
Hello,
I really enjoy using this tool for my Ks analysis. For some reason I have to get the standard error of Ks, and it would be perfect for me to get SE during the Ks analysis with wgd. I have tried a stupid way to change the getSE = 1 in the codeml.py and have --preserve to get the codeml result, and then wgd ksd failed before generating Ks result for a major part of the gene families, although there are still a few gene families succeeded in producing Ks result. I attached the error report here. This error is triggered by getSE = 1 in codeml.py, and the process goes well when getSE = 0. Do you have any advise?
Thank you very much for your help!
nohup.txt