Closed HinLeung622 closed 4 years ago
I have this problem with my gpu and jupyter notebooks. It might be different for you but I would just restart the computer. See if that works!
@grd349 Thanks for the suggestion, that was one of the things along many that I had to do/change to get my code working since last night. the bridging between pymc3 and keras is just so finicky.
Is it working now?
G
@grd349 Not really, jupyer no longer freezes and it doesn't crash either, but I am now facing either one of two "bad" situations:
m1.getWeights()
weights=m1.model.get_weights()
def manualPredict(inputs):
no_hidden_layers = len(weights)/2-1
xx=inputs
for i in np.arange(0,no_hidden_layers)*2:
i=int(i)
xx=T.nnet.relu(pm.math.dot(xx,weights[i])+weights[i+1],0)
xx=pm.math.dot(xx,weights[-2])+weights[-1]
return xx.T
model = pm.Model() with model:
log_m = pm.Normal('log_mass',np.log10(1),0.1)
log_a = pm.Normal('log_age',np.log10(4.5),0.1)
feh = pm.Normal('feh',0,0.1)
MLT = pm.Normal('MLT',1.9,0.1)
obs = pm.Deterministic('obs',manualPredict([log_m, log_a, feh, MLT]))
obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)
with model: trace = pm.sample()
2. If I comment out feh and MLT (logic being they were never varied in the small grid, so the NN have no grasp at all as to how feh or MLT causes the observables to vary), and just put the fixed values in the small grid (namely 1 and 1.9), I get a very horrible fit. Code:
m1.getWeights() weights=m1.model.get_weights()
def manualPredict(inputs):
no_hidden_layers = len(weights)/2-1
xx=inputs
for i in np.arange(0,no_hidden_layers)*2:
i=int(i)
xx=T.nnet.relu(pm.math.dot(xx,weights[i])+weights[i+1],0)
xx=pm.math.dot(xx,weights[-2])+weights[-1]
return xx.T
model = pm.Model() with model:
log_m = pm.Normal('log_mass',np.log10(1),0.1)
log_a = pm.Normal('log_age',np.log10(4.5),0.1)
#feh = pm.Normal('feh',0,0.1)
#MLT = pm.Normal('MLT',1.9,0.1)
obs = pm.Deterministic('obs',manualPredict([log_m, log_a, 1, 1.9]))
obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)
with model: trace = pm.sample()
fit results:
![image](https://user-images.githubusercontent.com/56083013/70065085-6f296f00-15e2-11ea-97dd-3332f09427f4.png)
![image](https://user-images.githubusercontent.com/56083013/70065121-7a7c9a80-15e2-11ea-8f44-974d3288d6d8.png)
All being said, I did have one successful instance of a good fit on all 4 parameters last night. This corner plot is the only thing I have saved of that fit:
![image](https://user-images.githubusercontent.com/56083013/70065233-a26bfe00-15e2-11ea-8306-9aad3ad0c909.png)
The code and numbers used to generate this were identical to the first version of the problem (when it says the chain failed), so I really do not know what is causing the problem here.
You probably want to give the sample step some starting parameters. The default start will be something funny like parameters spread between [-2, 2] - these would probably be really bad parameters to start with as the Neural Net will hate these as inputs. Much better would would be [1.0, 4,5, 0.0, 1.95] ... does that make sense?
That said ... the one that worked, really worked! :)
@grd349 So I do something like this in your example code?
inits = ['advi_map', 'adapt_diag']
start = {'lg_mass': np.log10(1.01), 'lg_age': np.log10(4.5)}
with model:
trace = pm.sample(tune=5000, init=inits[1], start=start, target_accept=0.99)
Also, why did you use "adapt_diag" for the initialization for the NUTS sampler?
Yes - like that but for all parameters.
I used 'adapt_diag' because the jitter that is normally added can send parameters to bad places. If you are nervous about starting at the true values then you should change them slightly, maybe [1.1, 3.1, -0.3, 2.1].
@grd349 Thanks! I tried giving it starting points and it no longer breaks in chains. Something peculiar happened though: I wanted for the summary table of the HBM to print the actual values of M and Age rather than the logged version of them, so I moved the log10 step on the mass and age inputs to right before inputting them to the NN. The code looks like this:
model = pm.Model()
with model:
M = pm.Lognormal('mass',np.log10(1),0.1)
Age = pm.Lognormal('age',np.log10(4.5),0.1)
feh = pm.Normal('feh',0,0.1)
MLT = pm.Normal('MLT',1.9,0.1)
obs = pm.Deterministic('obs',manualPredict([np.log10(M), np.log10(Age), feh, MLT]))
obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)
I was expecting a similar result to what I got last night but the mean estimated value for age now becomes 1.93Gyr, with 1 sigma 0.18Gyr, corner plot:
That is rather strange, so I reverted some changes and went back to the original "log_m and log_a" version of the code:
model = pm.Model()
with model:
log_m = pm.Normal('log_mass',np.log10(1),0.1)
log_a = pm.Normal('log_age',np.log10(4.5),0.1)
feh = pm.Normal('feh',0,0.1)
MLT = pm.Normal('MLT',1.9,0.1)
obs = pm.Deterministic('obs',manualPredict([log_m, log_a, feh, MLT]))
obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)
And this gives a mean estimate of age of 4.7Gyr! Corner plot also looks nice:
What could be causing this? (I know that this is all Bayesian inference and posterior/inference results are dependent on the prior we input. The priors I set up in the two situations are quite different (due to normal distribution on log value vs lognormal distribution on normal value) but I feel like that is not the full story here)
@grd349 I ditched the lognormal idea and just used normal distribution instead and now it's consistant with the old corner plot! (and also the real sun)
code:
with model:
M = pm.Normal('mass',1,0.1)
Age = pm.Normal('age',4.5,0.1)
feh = pm.Normal('feh',0,0.1)
MLT = pm.Normal('MLT',1.9,0.1)
obs = pm.Deterministic('obs',manualPredict([np.log10(M), np.log10(Age), feh, MLT]))
obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)
This makes me very happy! :)
@grd349 Thanks. Well I guess I will just stick with normal distributions for now, it shouldn't affect much if our NN are properly trained (even though theoratically we should be using lognormals for quantities like M and age as they do not go under 0)
Your uncertainty on Dnu is too large - a more realistic value might be 135 \pm 0.1.
Also, in the grid the solar Temperature is higher (~5900) and the luminosity is higher (~1.09) and this is because of the physics used in the models. You might want to use the observables from a point (read model) in the grid.
Finally before I leave you in peace, you might want to change your priors to be a lot less tight...
M = pm.Lognormal('mass',np.log(1.2),0.5)
Age = pm.Lognormal('age',np.log(4.5),1.0)
feh = pm.Normal('feh',0,0.5)
MLT = pm.Lognormal('MLT',np.log(1.9),0.3)
We might also want to consider bounds on the parameters, especially MLT.
@grd349 What does "use observables from a point in the grid" mean?
Oh - take the inputs to the Neural Net - the grid - and then pick a 'fake' star from that grid. This could be the grid point for solar mass, age, ... and just look up the observables for that point in the grid. Does that make sense now?
@grd349 Oh ok, but doesn't that defeat the purpose of relating the whole gird-NN thing back to real world observed data? (as in the sun's luminosity, Teff and delta nu)
Yes it does - but this is just a test to check what we are doing is sensible before we move on to real data.
@grd349 Oh so you mean this is just a check whether the NN is actually correctly approximating the grid, thats why we are using the grid as "real data" to reduce the degree of complexity?
Yes :) . Exactly!
@grd349 I have took your suggestions and changed the priors to have wider uncertainties, to make the grid provide observed data, and to enforce lower boundaries on mass, age and MLT:
m1.getWeights()
weights=m1.model.get_weights()
def manualPredict(inputs):
#input shape = 1D array with length as number of NN inputs
no_hidden_layers = len(weights)/2-1
xx=inputs
for i in np.arange(0,no_hidden_layers)*2:
i=int(i)
xx=T.nnet.relu(pm.math.dot(xx,weights[i])+weights[i+1],0)
xx=pm.math.dot(xx,weights[-2])+weights[-1]
return xx.T
model = pm.Model()
with model:
BoundM = pm.Bound(pm.Lognormal,lower=0)
M = BoundM('mass',np.log(1.2),0.5)
BoundAge = pm.Bound(pm.Lognormal,lower=0)
Age = BoundAge('age',np.log(4.5),1.0)
feh = pm.Normal('feh',0,0.5)
BoundMLT = pm.Bound(pm.Lognormal,lower=0)
MLT = BoundMLT('MLT',1.9,0.3)
obs = pm.Deterministic('obs',manualPredict([np.log10(M), np.log10(Age), feh, MLT]))
#obs data from small grid row 7992
obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.10587977656106)
obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5915.32055829505)
obs_delnu = pm.Normal('obs_delnu',10**obs[2],0.1, observed=134.4937686)
start = {'mass': 0.8, 'age': 2.5, 'feh':0.1, MLT:1.8}
with model:
trace = pm.sample(tune=5000, target_accept = 0.9, start=start)
But the results are unsatisfactory: Perhaps this is all due to the small grid not varying in feh and alpha_MLT.
This will need some thought but I would start by changing the constraint on luminosity to ...
obs_L = pm.Normal('obs_L',10obs[0],0.01,** observed=1.10)
which would be closer to some of my expectations i terms of uncertainty.
@grd349 Changed that, seems like not much changed in the results:
I guess I would say that yes, this is very weird that the HBM is not able to find the global posterior maximum at the expected, exact data spot of the star with those observables in the gird. But since the NN was not trained on varying feh and MLT data, we have no way of knowing how the posterior slopes in those dimensions. So it is possible that those two dimensions are "misleading" the HBM sampler into the false maximums.
@grd349 making feh and alpha_MLT fixed variables produce this nice estimation: error (dex) on mass = 0.0013 error (dex) on age = 0.0166 comparing to the mass and age for of the datum the observed values were taken from.
@grd349 I am trying to do what you did with "model sunlike star.ipynb", to use pymc3 to model our star given solar observables, but I am running into problems:
Problem: jupyter just freezes without doing any HBM work whenever I run it, after just printing this in red box:
I checked task manager and python is doing nothing despite jupyter saying the code is running. I have my "NN" function set up in about the same way yours is, and I have tested it on some mock up NNs, which returns the same result as using keras.model.predict. This should mean that there is no computational problem with my "NN" function itself. As for the pymc3 model building, there should be no computational problems there either, as it is large similar to your version. So my theory is that it has to do with some specific types of functions that theano/pymc3 will only accept inside the pymc3 model, but I do not know which functions. I also checked the two warnings, they are just cache problems, should not mess with running of the code. What are your thoughts? Have you come across this problem before?