daw538 / y4project

Fourth Year Masters project at University of Birmingham, investigating the helium glitch with asteroseismology to obtain a measure of helium abundance in stars.
MIT License
0 stars 0 forks source link

Negative Parameters #21

Closed daw538 closed 5 years ago

daw538 commented 5 years ago

Every once in a while the stan will return a fit for a star that has a negative value for one of the parameters (always A, ε or α). Perhaps an obvious example of this shows up in the latest fit (located in Dan_notebooks/hydra/fullsumm_tau.txt), where star[3] has provided a value of ε=-1.45. This in turn has affected the other parameters for the star (see the correlation plots in Dan_notebooks/fitanalysis.ipynb), making it a complete outlier in the results.

Now, I have tried to apply hard limits of <lower=0> to the relevant parameters, which fall in the transformed parameter section of the stan script. However any attempt to do so, on either ε or A results in an instant initialisation failure for both the no_tau and tau models. I don't quite get why it does this - any suggestions to get around it because as far I as understand ε has to be a positive number, given it's defined as an adjustment to the boundary condition at the surface?

Or perhaps I'm being stupid and missing something?

grd349 commented 5 years ago

You could place a uniform prior on the transformed parameters.

Something like

epsilon ~ uniform(0.5, 1.5) something like that.

daw538 commented 5 years ago

That hasn't helped unfortunately. I tried providing uniform distributions instead as well as starting values that sensibly fall within them, however this has no effect on the outcome.

Rejecting initial value:
  Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.

**[the above repeats for quite a while]**

Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Traceback (most recent call last):
  File "full_hbm.py", line 130, in <module>
    fit1 = sm1.sampling(data=stan_data, iter=iters_notau, chains=nchains, init=[start for n in range(nchains)])
  File "/usr/local/anaconda2/lib/python2.7/site-packages/pystan/model.py", line 776, in sampling
    ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/pystan/model.py", line 86, in _map_parallel
    map_result = pool.map(function, args)
  File "/usr/local/anaconda2/lib/python2.7/multiprocessing/pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/local/anaconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
RuntimeError: Initialization failed.
grd349 commented 5 years ago

You'll have to start debugging. Start by inspecting the start values - are the all within the acceptable bounds. Then start commenting out some of the prior constraint (one-by-one) until you isolate the problem. Does that make sense?

G

On Thu, 7 Mar 2019, 14:56 daw538, notifications@github.com wrote:

That hasn't helped unfortunately. I tried providing uniform distributions instead as well as starting values that sensibly fall within them, however this has no effect on the outcome.

Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity. Stan can't start sampling from this initial value.

[the above repeats for quite a while]

Initialization between (-2, 2) failed after 100 attempts. Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model. Traceback (most recent call last): File "full_hbm.py", line 130, in fit1 = sm1.sampling(data=stan_data, iter=iters_notau, chains=nchains, init=[start for n in range(nchains)]) File "/usr/local/anaconda2/lib/python2.7/site-packages/pystan/model.py", line 776, in sampling ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs) File "/usr/local/anaconda2/lib/python2.7/site-packages/pystan/model.py", line 86, in _map_parallel map_result = pool.map(function, args) File "/usr/local/anaconda2/lib/python2.7/multiprocessing/pool.py", line 253, in map return self.map_async(func, iterable, chunksize).get() File "/usr/local/anaconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value RuntimeError: Initialization failed.

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/daw538/y4project/issues/21#issuecomment-470557472, or mute the thread https://github.com/notifications/unsubscribe-auth/AOGXCcV4a00N5ovuGb28arbSMY6QT1E4ks5vUSikgaJpZM4bi-Tf .

--

Dr Guy R. Davies Lecturer in Astrophysics School of Physics and Astronomy The University of Birmingham Edgbaston Birmingham B15 2TT

Tel +44 (0) 121 414 4597

g.r.davies@bham.ac.uk grd349@gmail.com davies@bison.ph.bham.ac.uk

daw538 commented 5 years ago

From what I can pick up from some digging around, the issue regarding initialisation failure arises from applying a distribution to a transformed parameter in general.

DIAGNOSTIC(S) FROM PARSER:
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    A ~ uniform(...)

The critical point I noticed in the online documentation is the following:

https://mc-stan.org/misc/warnings.html If you fail to heed this warning, the posterior distribution Stan will sample from is not necessarily the posterior distribution that you have in mind. The only situation in which you can ignore this warning is when you are sure that the determinant of the Jacobian matrix of the transformation depends only on constants.

Now I don't fully understand what it's getting at here, other than some more code may need to be added to account for the Jacobean matrix.

I then tried to follow through some previous threads and docs online: https://discourse.mc-stan.org/t/putting-priors-on-transformed-parameters/2488/3 https://mc-stan.org/docs/2_18/stan-users-guide/changes-of-variables.html https://mc-stan.org/docs/2_18/reference-manual/change-of-variables-section.html

From which I then changed the code to include:

    A ~ uniform(0.001, 0.1);
    target += log(A_std);

The argument on the RHS of the second line was a complete guess to simply test if it would work - indeed it does appear to resolve the issue over initialisation, however convergence becomes poor so clearly needs replacing with something more suitable (but is a potential starting point at least).

I will try to have a look this evening again, but would like to discuss this regardless in the meeting tomorrow.

Cheers.

daw538 commented 5 years ago

I've managed to circumvent this issue by preventing stars with poor unreasonable starting values being parsed into the second stan model which incorporates the decay term, shown in commit 122d0d2. For example, any star that has an estimated amplitude A < 0 provided by the initial model, is removed from consideration by the second model. Since this only occurred for a small number of stars in the first case this does not have a detrimental impact on the total number of modelled stars, and avoids the need to develop a Jacobean determinant that would have been required lest we needed to sample a transformed parameter (per documentation linked in previous post).