Closed esud1 closed 4 years ago
Not sure, but I remember encountering such a stalling issue once due to the codeml
program (which is used under the hood to estimate Ks using ML) hanging without giving an error on some gene family. I would suggest two things to figure this out: (1) run the program with the -v debug
flag (this will give you a lot more information) and (2) try to locate if there is a specific gene family that is causing this freezing. If you could obtain a small data set for which you can reproduce the issue, that would be helpful for me (I strongly doubt it has anything to do with the size of your data set).
After running a few more runs, I notice that the program always freezes when analyzing the same gene families (the last processed file). But when I extract those gene families and run them on their own, wgd ksd seems to be working fine.
Should I remove these gene families and try to re-run the program?
Thanks
That sounds strange, could you provide me with a test data set (CDS sequences and families) so that I can try to reproduce the problem? Preferably a relatively small subset for which you observe this issue.
Hi, I tried to perform it on a smaller subset (~5k sequences) and encountered no problem.
However, when I scaled it up to ~10k sequences, the program stalled.
I had a look at their latest temp files, and it seems that there were no problems with the codeml
program; the .Ks files were generated.
I think the problem lies when the program is trying to merge all of these files to create the plot and .tsv files.
I also tried to run wgd
on Arabidopsis whole CDS data (downloaded from PLAZA), and encountered the same problem - the program finishes the codeml
part, but could not move forward from there.
btw, I am using phyml (v 3.3.20190321)
instead of FastTree
. Will it affect the run?
HI, I'll try to figure this out. I usually don't use phyml
for the trees, so it could have something to do with that. Maybe some of the largest families take up on inordinate amount of time? (You could check the active processes using top
or htop
on linux, perhaps you see phyml
still running, or just try running phyml on the largest gene family). (BTW: If you do not plan to use the trees afterwards, I'd recommend using fasttree, or the clustering approach, as an occasional tree error will barely affect the distribution).
Hi Arthur,
I tried to run wgd
with FastTree
and the problem is fixed!
And yes, the problem lies with the phyml
, I could saw that phyml
was still running in the background when I checked using top
.
Many thanks for your help! :)
Hi,
I am having a bit of trouble with the wgd ksd step.
The program can run smoothly and produces a number of files in the tmp folder (incl. fasta, msa, and Ks files), but at some point in time, it just freezes and nothing happened after that. No info was given from the command line interface. I have let my current program run for 4 days, but still no changes. Here is the screenshot of the last INFO given by the program
I tried running a smaller subset of the data (1000 random ones, similar to the supplemental info in the paper), and the program has no problem giving the output.
Do you know what went wrong? Could it be due to the size of the data or other problems? FYI, the CDS fasta file is ~30 Mb, and the mcl file is ~369 kb.