birc-aeh / coalhmm

CoalHMM
22 stars 7 forks source link

Warnings? #6

Closed MaoYafei closed 7 years ago

MaoYafei commented 7 years ago

Dear All,

Thanks for building this useful tool for us to investigate the questions on molecular speciation. I have built this tool on my cluster. And run the test-data and my data. But I got some warnings so Could you mind giving some helps on my confusions on these warnings?

Warning 1. "I didn't understand the following symbols form the input sequence: KMSRWY"

Confusion: did the ambiguous symbols affects the modeling results? (I guess the tool will take them as 'N', right?)

Warning 2. "Loading forwarders... done
Constructing model... done
Warning: Maximum number of function evaluations has been exceeded."

Confusion: it seems like that the more computation need or something can not be convergent? so, does these warning will affect the results?

Moreover, one more quesiton:

Actually, I have a scaffold-level genome (and small genome size) and thus the pairwise alignment is not long length as your data (10mb). Thus, I wondering that which below data is better for your tool to model: a. 300 alignments, each of them over 500Kb. b. 100 alignments, each of them over 1Mb.

Thanks a lot and looking forward to your reply.

Best, Yafei

mailund commented 7 years ago

Hi Yafei,

Don't worry about the first warning too much. The processing of the input alignment just translates unknown symbols into N as you guessed. This should not cause any problems as long as there are only a few of these relative to the full alignment.

The second warning is more of an issue. It does mean that the likelihood optimisation didn't find a maximum. A way to get around this is to use the parameters you got in such an initial run as the starting values for another run. If you start near the maximum, the optimizer should have an easier way finding the maximum.

For the last question, I don't think it will matter at all if you have your alignments in 500K or 1M chunks. Both should be long enough to contain all the spatial information the model needs. But in total, you will want at least 10M to get reliable results.

MaoYafei commented 7 years ago

Hi Mailund,

Thanks for your reply. Everything is clear now. Thanks again.

Best, Yafei