gchen98 / macs

Automatically exported from code.google.com/p/macs
16 stars 6 forks source link

growth increases runtime? #36

Open agladstein opened 7 years ago

agladstein commented 7 years ago

Hi Gary,

When I added exponential growth to my model, I'm getting a massive increases in runtime. Naively, I would not expect that to happen. larger growth means, back in time, smaller Ne, which means less coalescent events... smaller Ne means shorter sim time. The expected number of generations until two samples share a common ancestor is 2N...

Am I thinking about this wrong?

Thanks, Ariella

gchen98 commented 7 years ago

Just a naive question, do you see similar behavior with the same parameters in ms?

agladstein commented 7 years ago

Not answering your question yet... but,

It looks like I'm getting the expected behavior (run time reduced as growth rate increases) happens up to a certain growth rate (between 580 and 606). image

I wonder if it is getting stuck because all the lineages coalesce before the next time event, and instead of giving an error, macs just get's stuck at that point.

my code is:

macs 202 2000000 -t 0.0002508 -s 1278 -r 0.00010032 -h 1e5 -I 6 21 13 36 28 28 76 -n 1 8.35566188198 -n 2 4.25119617225 -n 3 0.55701754386 -n 4 8.54385964912 -n 5 39.961722488 -n 6 617.22169059 -eg 0 6 200.64 -eg 0.000001 4 200.64 -eg 0.000002 5 200.64 -em 0.00209330143541 6 3 4671.32533404 -em 0.00209430143541 6 3 0 -ej 0.00239234449761 6 4 -ej 0.054625199362 5 4 -ej 0.0631977671451 4 3 -en 0.240829346093 1 1.0 -ej 0.241726475279 3 2 -ej 0.302830940989 2 1

Attached are screen shots of the macs output when it gets stuck.

screen shot 2017-05-22 at 6 14 16 pm screen shot 2017-05-22 at 6 15 30 pm screen shot 2017-05-22 at 6 17 02 pm screen shot 2017-05-22 at 6 18 08 pm
gchen98 commented 7 years ago

Yeah I've noticed this kind of behavior on occasion, generally when the number of chromosomes being sampled is too small. One workaround I have suggested to others is to try scaling up the number of chromosomes by a certain factor and then subsample the output. This does not affect the growth or coalescent rates, as you are still assuming the same Ne. Just sampling more haploids from that Ne.

agladstein commented 7 years ago

I see. Do you have any idea what the relationship is between sample size and run time?

gchen98 commented 7 years ago

macs doesn't scale as nicely wrt to sample size as ms. but as a rule of thumb, use ms for larger samples, smaller regions, and macs vice versa.

On 05/25/2017 02:38 PM, agladstein wrote:

I see. Do you have any idea what the relationship is between sample size and run time?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gchen98/macs/issues/36#issuecomment-304131976, or mute the thread https://github.com/notifications/unsubscribe-auth/AKfCumSPlwP_2qk9XXB3BUulCRvQyGj6ks5r9fTxgaJpZM4Ni9co.

agladstein commented 7 years ago

That's what I thought.

I tried to fix the growth rate problem by setting my maximum growth rate relative to current effective population size and time over which growth occurs, and minimum effective population size before growth (10):

NA = float(round(10 ** random.uniform(4.0, 6.7)))

T_High = 36
T_Low = 20
T = float(randint(TA_Low, TA_High))

r_High = -(1/TA) * math.log(10/NA)
r_Low = 0.0
r = random.uniform(r_Low, r_High)

And scaled parameters are used in macs input:

scaled_NA=float(NA/NANC)
scaled_T=float(T/(4*NANC))
scaled_r=float(r*4*NANC)

Theoretically, the effective population size before growth should not get less than 10. I've been running these simulations for 2 days now.... I think maybe it's working. I'm not exactly sure.

gchen98 commented 7 years ago

Ariella,

Did you try scaling up the sampled chromosomes? If you are watching the output of the trees on the console, and it is taking long than a second to move to the next tree, the program is probably stuck in an infinite loop looking for a branch to coalesce to. Gary

agladstein commented 7 years ago

I haven't tried increasing the sample size because I don't think it's feasible considering the run time increase. I agree, that's probably what's happening. I also found that some simulations produce 0 segregating sites in the population size exponential growth. I tried increasing the founder size to 100 and it improved the number of segregating sites... I decided for this project I will switch to instantaneous growth and come back to exponential growth later.

N10

n10

N100

n100