Problem when running runfixedpop with different qcrit values

michaelzevin commented 5 years ago

I've been doing some runs where I've changed the critical mass ratio at which unstable mass transfer is initiated for naked and evolved naked helium stars (kstar=7,8,9). The runs have been failing part of the way through with the following error:

Traceback (most recent call last): File "/home/mjz672/.conda/envs/cosmic-qcrit/bin/runFixedPop", line 258, in bpp, bcm, initCond = Evolve.evolve(initialbinarytable=IBT, BSEDict=BSEDict, nproc=args.nproc, idx=idx, dtp=dtp) File "/home/mjz672/.conda/envs/cosmic-qcrit/lib/python3.6/site-packages/cosmic/evolve.py", line 205, in evolve bpp_arrays = np.vstack(output[:, 1]) IndexError: too many indices for array Closing remaining open files:dat_DeltaBurst_13_13.h5...done

The run got through ~11000 binaries before this happened. You can find the working directory at /projects/b1011/mzevin/cosmic/CE_tests/v265_lamPols_qcrit/

michaelzevin commented 5 years ago

FYI, this seemed to happen in another run where I changed qcrit as well. The working directory for that run is: /projects/b1011/mzevin/cosmic/CE_tests/v265_lamPols_qcrit_endCE

katiebreivik commented 5 years ago

I have run into this before as well. I'm not sure exactly what happens, but I suspect that the evolv2 subroutine exits before anything is recorded? It might be worth digging in to see where the goto statements are that send you to the end of the evolution since they could give us a clue as to what is happening to cause the binary evolution to freak out.

We should also probably add in an error catcher that notes the bin_num/IBT for the binary that causes this.

katiebreivik commented 5 years ago

I'm running a couple of runs right now to try to reproduce this problem and figure out why BSE is borking...

katiebreivik commented 5 years ago

Ok; so I've found a case where this happens:

single_binary = InitialBinaryTable.SingleBinary(m1=0.8810539028092682, m2=0.24953201992853882, porb=1.8193761088893858, ecc=0.4731890565069729, tphysf=9956.942879351827, kstar1=1, kstar2=0, metallicity=0.0001)

What's happening with this particular binary is late in the evolution (~9500 Myr) it settles into constant state of going into and out of roche overflow every 2 million years. This would lead to a huge bpp array, so fortran stops the evolution after 80 timesteps. The problem is that now the final timestep isn't -1 anymore, so we run into the problem when evolve.py requires:

            bpp = bpp[:np.argwhere(bpp[:,0] == -1)[0][0]]

Two ideas to get around this so far:

1 - the bcm array has 50,000 slots to fill; maybe we just arbitrarily increase the size of the bpp array so this is less likely to happen? ---> Increasing the size of bpp to 1000 should fix this.

2 - We try to catch the bad binary in evolve.py by building off the exception that is already built in?

2a - both 1 and 2?

katiebreivik commented 5 years ago

@michaelzevin Can you try some more runs with the updated develop branch?

I think it should be solved, but want to make sure before we close this issue.

michaelzevin commented 5 years ago

@katiebreivik I'll give some new runs a go. I just ran a couple before the updated branch that had different qcrit values and they converged properly, so hopefully the isolated events that this happens in will be taken care of with your updates!

COSMIC-PopSynth / COSMIC

Problem when running runfixedpop with different qcrit values #130