Closed trongnhanuit closed 2 years ago
Hello Nhan,
Thank you for your detailed description of the issue and for providing an input file. I wish all issues had that much information.
I believe we have found and fixed the bug responsible for this behavior. Please pull the last change and let me know if you still encounter any issues.
Best, Juan
Dear @jgarciamesa, @reedacartwright,
Thank you very much for promptly resolving this issue.
Best wishes,
Nhan
I'm using Dawg (version 2.0.1) to simulate large MSAs with Indels but it returned output with only gaps (without any nucleotide) and the output sequence length was much shorter than the MSA simulated by INDELible on the same setting. The detail is as follows.
Input: (input file: input_shortbranches_100000_30000.dawg)
Execution command: dawg input_shortbranches_100000_30000.dawg -o dawgOutput.phy
Output:
To make sure that issue was not due to the high deletion rate, I changed the Indel-rate to 0.15, 0.05 for the insertion, and deletion rates, respectively. Dawg returned sequences with 37235 sites with only gaps.
Besides, That issue also occurred when I replicated this simulation on a larger tree (with 1.000.000 tips). The output sequences of Dawg contain 35207 sites with only gaps. Meanwhile, on that simulation, INDELible outputs sequences with >150 000 sites with both gaps and nucleotides.
However, when I tested on smaller trees (e.g. with 10.000, 1.000 tips), Dawg outputs the sequences (containing both gaps and nucleotides) with the sequence length is close to that of MSAs simulated by INDELible.
Therefore, I think there is a bug in Dawg when simulating large/huge MSAs with Indels. Could you please help me to have a check? Many thanks,
Cheers, Nhan