Harry-Westwood / Y4-Project-InterNeuralStellar

Bayesian Hierarchical Modelling and Machine Learning of Stellar Populations
1 stars 0 forks source link

giving pymc3 NN weights freezing the kernel #23

Closed HinLeung622 closed 4 years ago

HinLeung622 commented 4 years ago

@grd349 I am trying to do what you did with "model sunlike star.ipynb", to use pymc3 to model our star given solar observables, but I am running into problems:

def NN(inputs, model):
    #input shape = 1D array with length as number of NN inputs
    weights=model.get_weights()
    no_hidden_layers=len(weights)/2-1
    xx=inputs
    for i in np.arange(0,no_hidden_layers)*2:
        i=int(i)
        xx=T.nnet.relu(np.dot(xx,weights[i])+weights[i+1],0)
    return np.dot(xx,weights[-2])+weights[-1]

model = pm.Model()
with model:
    log_m = pm.Normal('log_mass',np.log10(1),np.log10(0.1))
    log_a = pm.Normal('log_age',np.log10(4.5),np.log10(0.5))
    feh = pm.Normal('feh',0,0.1)
    MLT = pm.Normal('MLT',1.9,0.1)

    inputs=np.array([[log_m, log_a, feh, MLT]])
    #print(inputs)
    #inputs=np.array([[1,1,1,1]])
    obs = pm.Deterministic('obs',NN(inputs, m1.model)[0])
    #print(obs)

    obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
    #print(obs_L)
    obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
    obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)

with model:
    trace = pm.sample(tune=5000, target_accept=0.99)

Problem: jupyter just freezes without doing any HBM work whenever I run it, after just printing this in red box:

WARNING (theano.gof.compilelock): Overriding existing lock by dead process '8932' (I am process '10168')
WARNING (theano.gof.cmodule): Deleting (broken cache directory [EOF]): C:\Users\User\AppData\Local\Theano\compiledir_Windows-10-10.0.18362-SP0-Intel64_Family_6_Model_78_Stepping_3_GenuineIntel-3.7.3-64\tmpce2xuy2e

I checked task manager and python is doing nothing despite jupyter saying the code is running. I have my "NN" function set up in about the same way yours is, and I have tested it on some mock up NNs, which returns the same result as using keras.model.predict. This should mean that there is no computational problem with my "NN" function itself. As for the pymc3 model building, there should be no computational problems there either, as it is large similar to your version. So my theory is that it has to do with some specific types of functions that theano/pymc3 will only accept inside the pymc3 model, but I do not know which functions. I also checked the two warnings, they are just cache problems, should not mess with running of the code. What are your thoughts? Have you come across this problem before?

grd349 commented 4 years ago

I have this problem with my gpu and jupyter notebooks. It might be different for you but I would just restart the computer. See if that works!

HinLeung622 commented 4 years ago

@grd349 Thanks for the suggestion, that was one of the things along many that I had to do/change to get my code working since last night. the bridging between pymc3 and keras is just so finicky.

grd349 commented 4 years ago

Is it working now?

G

HinLeung622 commented 4 years ago

@grd349 Not really, jupyer no longer freezes and it doesn't crash either, but I am now facing either one of two "bad" situations:

  1. If I use the full 4-long input in the NN, that is mass, age, feh and MLT, with (i think) sensible uncertainties, I get an error message that says "Chain 0 failed." (sometimes chain 1) in the middle of sampling. Code:
    
    m1.getWeights()
    weights=m1.model.get_weights()

def manualPredict(inputs):

input shape = 1D array with length as number of NN inputs

no_hidden_layers = len(weights)/2-1
xx=inputs
for i in np.arange(0,no_hidden_layers)*2:
    i=int(i)
    xx=T.nnet.relu(pm.math.dot(xx,weights[i])+weights[i+1],0)
xx=pm.math.dot(xx,weights[-2])+weights[-1]
return xx.T

model = pm.Model() with model:

log_m = pm.Normal('log_mass',np.log10(1),0.1)
log_a = pm.Normal('log_age',np.log10(4.5),0.1)
feh = pm.Normal('feh',0,0.1)
MLT = pm.Normal('MLT',1.9,0.1)

obs = pm.Deterministic('obs',manualPredict([log_m, log_a, feh, MLT]))

obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)

with model: trace = pm.sample()


2. If I comment out feh and MLT (logic being they were never varied in the small grid, so the NN have no grasp at all as to how feh or MLT causes the observables to vary), and just put the fixed values in the small grid (namely 1 and 1.9), I get a very horrible fit. Code:

m1.getWeights() weights=m1.model.get_weights()

def manualPredict(inputs):

input shape = 1D array with length as number of NN inputs

no_hidden_layers = len(weights)/2-1
xx=inputs
for i in np.arange(0,no_hidden_layers)*2:
    i=int(i)
    xx=T.nnet.relu(pm.math.dot(xx,weights[i])+weights[i+1],0)
xx=pm.math.dot(xx,weights[-2])+weights[-1]
return xx.T

model = pm.Model() with model:

log_m = pm.Normal('log_mass',np.log10(1),0.1)
log_a = pm.Normal('log_age',np.log10(4.5),0.1)
#feh = pm.Normal('feh',0,0.1)
#MLT = pm.Normal('MLT',1.9,0.1)

obs = pm.Deterministic('obs',manualPredict([log_m, log_a, 1, 1.9]))

obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)

with model: trace = pm.sample()


fit results:
![image](https://user-images.githubusercontent.com/56083013/70065085-6f296f00-15e2-11ea-97dd-3332f09427f4.png)

![image](https://user-images.githubusercontent.com/56083013/70065121-7a7c9a80-15e2-11ea-8f44-974d3288d6d8.png)

All being said, I did have one successful instance of a good fit on all 4 parameters last night. This corner plot is the only thing I have saved of that fit:
![image](https://user-images.githubusercontent.com/56083013/70065233-a26bfe00-15e2-11ea-8306-9aad3ad0c909.png)

The code and numbers used to generate this were identical to the first version of the problem (when it says the chain failed), so I really do not know what is causing the problem here.
grd349 commented 4 years ago

You probably want to give the sample step some starting parameters. The default start will be something funny like parameters spread between [-2, 2] - these would probably be really bad parameters to start with as the Neural Net will hate these as inputs. Much better would would be [1.0, 4,5, 0.0, 1.95] ... does that make sense?

grd349 commented 4 years ago

That said ... the one that worked, really worked! :)

HinLeung622 commented 4 years ago

@grd349 So I do something like this in your example code?

inits = ['advi_map', 'adapt_diag']
start = {'lg_mass': np.log10(1.01), 'lg_age': np.log10(4.5)}

with model:
    trace = pm.sample(tune=5000, init=inits[1], start=start, target_accept=0.99)

Also, why did you use "adapt_diag" for the initialization for the NUTS sampler?

grd349 commented 4 years ago

Yes - like that but for all parameters.

I used 'adapt_diag' because the jitter that is normally added can send parameters to bad places. If you are nervous about starting at the true values then you should change them slightly, maybe [1.1, 3.1, -0.3, 2.1].

HinLeung622 commented 4 years ago

@grd349 Thanks! I tried giving it starting points and it no longer breaks in chains. Something peculiar happened though: I wanted for the summary table of the HBM to print the actual values of M and Age rather than the logged version of them, so I moved the log10 step on the mass and age inputs to right before inputting them to the NN. The code looks like this:

model = pm.Model()
with model:

    M = pm.Lognormal('mass',np.log10(1),0.1)
    Age = pm.Lognormal('age',np.log10(4.5),0.1)
    feh = pm.Normal('feh',0,0.1)
    MLT = pm.Normal('MLT',1.9,0.1)

    obs = pm.Deterministic('obs',manualPredict([np.log10(M), np.log10(Age), feh, MLT]))

    obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
    obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
    obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)

I was expecting a similar result to what I got last night but the mean estimated value for age now becomes 1.93Gyr, with 1 sigma 0.18Gyr, corner plot: image

That is rather strange, so I reverted some changes and went back to the original "log_m and log_a" version of the code:

model = pm.Model()
with model:

    log_m = pm.Normal('log_mass',np.log10(1),0.1)
    log_a = pm.Normal('log_age',np.log10(4.5),0.1)
    feh = pm.Normal('feh',0,0.1)
    MLT = pm.Normal('MLT',1.9,0.1)

    obs = pm.Deterministic('obs',manualPredict([log_m, log_a, feh, MLT]))

    obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
    obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
    obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)

And this gives a mean estimate of age of 4.7Gyr! Corner plot also looks nice: image

What could be causing this? (I know that this is all Bayesian inference and posterior/inference results are dependent on the prior we input. The priors I set up in the two situations are quite different (due to normal distribution on log value vs lognormal distribution on normal value) but I feel like that is not the full story here)

HinLeung622 commented 4 years ago

@grd349 I ditched the lognormal idea and just used normal distribution instead and now it's consistant with the old corner plot! (and also the real sun) image

code:

with model:

    M = pm.Normal('mass',1,0.1)
    Age = pm.Normal('age',4.5,0.1)
    feh = pm.Normal('feh',0,0.1)
    MLT = pm.Normal('MLT',1.9,0.1)

    obs = pm.Deterministic('obs',manualPredict([np.log10(M), np.log10(Age), feh, MLT]))

    obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.0)
    obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5777)
    obs_delnu = pm.Normal('obs_delnu',10**obs[2],15, observed=135)
grd349 commented 4 years ago

This makes me very happy! :)

HinLeung622 commented 4 years ago

@grd349 Thanks. Well I guess I will just stick with normal distributions for now, it shouldn't affect much if our NN are properly trained (even though theoratically we should be using lognormals for quantities like M and age as they do not go under 0)

grd349 commented 4 years ago

Your uncertainty on Dnu is too large - a more realistic value might be 135 \pm 0.1.

Also, in the grid the solar Temperature is higher (~5900) and the luminosity is higher (~1.09) and this is because of the physics used in the models. You might want to use the observables from a point (read model) in the grid.

Finally before I leave you in peace, you might want to change your priors to be a lot less tight...

M = pm.Lognormal('mass',np.log(1.2),0.5)
Age = pm.Lognormal('age',np.log(4.5),1.0)
feh = pm.Normal('feh',0,0.5)
MLT = pm.Lognormal('MLT',np.log(1.9),0.3)

We might also want to consider bounds on the parameters, especially MLT.

HinLeung622 commented 4 years ago

@grd349 What does "use observables from a point in the grid" mean?

grd349 commented 4 years ago

Oh - take the inputs to the Neural Net - the grid - and then pick a 'fake' star from that grid. This could be the grid point for solar mass, age, ... and just look up the observables for that point in the grid. Does that make sense now?

HinLeung622 commented 4 years ago

@grd349 Oh ok, but doesn't that defeat the purpose of relating the whole gird-NN thing back to real world observed data? (as in the sun's luminosity, Teff and delta nu)

grd349 commented 4 years ago

Yes it does - but this is just a test to check what we are doing is sensible before we move on to real data.

HinLeung622 commented 4 years ago

@grd349 Oh so you mean this is just a check whether the NN is actually correctly approximating the grid, thats why we are using the grid as "real data" to reduce the degree of complexity?

grd349 commented 4 years ago

Yes :) . Exactly!

HinLeung622 commented 4 years ago

@grd349 I have took your suggestions and changed the priors to have wider uncertainties, to make the grid provide observed data, and to enforce lower boundaries on mass, age and MLT:

m1.getWeights()
weights=m1.model.get_weights()

def manualPredict(inputs):
    #input shape = 1D array with length as number of NN inputs
    no_hidden_layers = len(weights)/2-1
    xx=inputs
    for i in np.arange(0,no_hidden_layers)*2:
        i=int(i)
        xx=T.nnet.relu(pm.math.dot(xx,weights[i])+weights[i+1],0)
    xx=pm.math.dot(xx,weights[-2])+weights[-1]
    return xx.T

model = pm.Model()
with model:
    BoundM = pm.Bound(pm.Lognormal,lower=0)
    M = BoundM('mass',np.log(1.2),0.5)
    BoundAge = pm.Bound(pm.Lognormal,lower=0)
    Age = BoundAge('age',np.log(4.5),1.0)
    feh = pm.Normal('feh',0,0.5)
    BoundMLT = pm.Bound(pm.Lognormal,lower=0)
    MLT = BoundMLT('MLT',1.9,0.3)

    obs = pm.Deterministic('obs',manualPredict([np.log10(M), np.log10(Age), feh, MLT]))

    #obs data from small grid row 7992
    obs_L = pm.Normal('obs_L',10**obs[0],0.1, observed=1.10587977656106)
    obs_Teff = pm.Normal('obs_Teff',10**obs[1],70, observed=5915.32055829505)
    obs_delnu = pm.Normal('obs_delnu',10**obs[2],0.1, observed=134.4937686)

start = {'mass': 0.8, 'age': 2.5, 'feh':0.1, MLT:1.8}
with model:
    trace = pm.sample(tune=5000, target_accept = 0.9, start=start)

But the results are unsatisfactory: image image image Perhaps this is all due to the small grid not varying in feh and alpha_MLT.

grd349 commented 4 years ago

This will need some thought but I would start by changing the constraint on luminosity to ...

obs_L = pm.Normal('obs_L',10obs[0],0.01,** observed=1.10)

which would be closer to some of my expectations i terms of uncertainty.

HinLeung622 commented 4 years ago

@grd349 Changed that, seems like not much changed in the results: image image image

I guess I would say that yes, this is very weird that the HBM is not able to find the global posterior maximum at the expected, exact data spot of the star with those observables in the gird. But since the NN was not trained on varying feh and MLT data, we have no way of knowing how the posterior slopes in those dimensions. So it is possible that those two dimensions are "misleading" the HBM sampler into the false maximums.

HinLeung622 commented 4 years ago

@grd349 making feh and alpha_MLT fixed variables produce this nice estimation: image image image error (dex) on mass = 0.0013 error (dex) on age = 0.0166 comparing to the mass and age for of the datum the observed values were taken from.