Helping HBM sample better (cont.)

HinLeung622 commented 4 years ago

@grd349 This is a continuation of issue #45. A new thread is made since the last one is getting ridiculously long and it takes ages to load in all the images.

State of the HBM: data = M67 low mass dwarfs with binaries/double stars included (total = 354 stars)

I tried making a hand-drawn detailed PGM of the current state of the HBM hand_drawn_pgm (pencil = NN conversions)

full model:

model = pm.Model()
with model:

    #cluster-wide fundamentals
    Age_mu = pm.Deterministic('mean_age',pm.Beta('a',10,10)*2+2.5)
    feh_mu = pm.Deterministic('mean_feh',pm.Beta('e',10,10)*0.4-0.2)
    Y_mu = pm.Deterministic('mean_Y',pm.Beta('f',10,10)*0.04+0.24)
    MLT_mu = pm.Deterministic('mean_MLT',pm.Beta('g',10,10)*0.6+1.7)
    MLT_sigma = pm.Lognormal('spread_MLT',T.log(0.1),0.5)

    #per star fundamentals
    M = pm.Deterministic('mass', pm.Beta('d',10,10,shape=N)*(1.2-0.8)+0.8)
    Age = pm.Deterministic('age',T.ones(N)*Age_mu)
    feh = pm.Deterministic('feh',T.ones(N)*feh_mu)
    Y = pm.Deterministic('Y',T.ones(N)*Y_mu)
    MLT = pm.Deterministic('MLT',pm.Normal('MLT_normal', 0, 1, shape=N)*MLT_sigma+MLT_mu)

    #NN calculation
    obs = pm.Deterministic('obs',m1.manualPredict(T.log10([M, Age, 10**feh, Y, MLT])))

    #intermediate observables
    radius = pm.Deterministic('radius', 10**obs[0])
    Teff = pm.Deterministic('Teff', (10**obs[1])*5000)
    L = pm.Deterministic('L', (radius**2)*((Teff/Teff_sun)**4))
    logg = pm.Deterministic('logg', T.log10(100*constants.G.value*(M/radius**2)*(constants.M_sun.value/constants.R_sun.value**2)))

    #extinction prior
    A_V_mu = pm.Lognormal('mean_A_V', T.log(0.11), 1.0) # X & Y are 'sensible' values
    A_V_std = pm.Lognormal('spread_A_V', T.log(0.1), 1.0) # Different X & Y but still sensible
    A_V_ = pm.Normal('A_V_', 0, 1, shape=N) # an N(0, 1) dist to be scaled
    A_V = pm.Deterministic('A_V', A_V_ * A_V_std + A_V_mu)

    #second NN calculation
    BCs = pm.Deterministic('BCs', t1.manualPredict(T.as_tensor_variable([T.log10(Teff), logg, feh, A_V])))

    #BCs
    BCg = pm.Deterministic('BCg', BCs[5,:])
    BCbp = pm.Deterministic('BCbp', BCs[7,:])
    BCrp = pm.Deterministic('BCrp', BCs[8,:])

    #distance and parallax
    dist_mod_ = pm.Normal('dist_mod', dist_mod, 1.0)
    true_parallax = pm.Deterministic('true_parallax', 10**(-dist_mod_/5-1)*1000)
    obs_parallax = pm.Normal('obs_parallax', true_parallax, M67_parallax[1], observed=M67_parallax[0])

    #true observables
    true_mG = pm.Deterministic('true_mG', -2.5*T.log10(L)+Mbol-BCg+dist_mod_)
    true_Bp = pm.Deterministic('true_Bp', -2.5*T.log10(L)+Mbol-BCbp+dist_mod_)
    true_Rp = pm.Deterministic('true_Rp', -2.5*T.log10(L)+Mbol-BCrp+dist_mod_)

    #dealing with multiples
    q = pm.Dirichlet('q', a=np.ones(3), testval=np.array([0.8,0.1,0.1]))
    mu_delta_mG1 = pm.Lognormal('mu_delta_mG1', np.log(0.75), 0.3)
    mu_delta_mG2 = pm.Lognormal('mu_delta_mG2', np.log(0.375), 0.3)
    sig = pm.Lognormal('sig', np.log(0.1), 0.2, shape=2)
    dist_singular_G = pm.Normal.dist(0, 10*M67['g_mag_err'])
    dist_singular_Bp = pm.Normal.dist(0, 10*M67['bp_mag_err'])
    dist_singular_Rp = pm.Normal.dist(0, 10*M67['rp_mag_err'])
    dist_high = pm.Normal.dist(-mu_delta_mG1, sig[0])
    dist_low = pm.Normal.dist(-mu_delta_mG2, sig[1])

    #obs observables
    obs_mG = pm.Mixture('obs_mG', w=q, comp_dists = [dist_singular_G, dist_high, dist_low], \
                        observed=M67['g_mag'].values-true_mG)
    obs_Bp = pm.Mixture('obs_Bp', w=q, comp_dists = [dist_singular_Bp, dist_high, dist_low], \
                        observed=M67['bp_mag'].values-true_Bp)
    obs_Rp = pm.Mixture('obs_Rp', w=q, comp_dists = [dist_singular_Rp, dist_high, dist_low], \
                        observed=M67['rp_mag'].values-true_Rp)

Currently taking 10x original GAIA magnitude errors as our obs errors

grd349 commented 4 years ago

That age range looks a bit narrow now. Should it be beta(X, X) * 3 + 2 ?

HinLeung622 commented 4 years ago

@grd349 Latest result (with the implementation of everything above, tuning = 2000, sample = 500): As expected, shrinking the multiple of the obs G mag error would cause the HBM to have a harder sampling time. Samplers have not converged. Should I increase the tuning steps or should we loosen the multiples on the obs errors a bit? (Do note the much larger obs errors (black bars) on Bp-Rp compared to G mag. (the bars are already multiplied by the current multiple:10))

HinLeung622 commented 4 years ago

@grd349 You think the age prior should be wider? why?

Oh also, the above result means directly setting the singular distributions to have sigmas of an array works

grd349 commented 4 years ago

As best my eyes can tell everything else looks good.

Great work! :)

That hand drawn PGM is epic. I love it! It's virtually art :)

HinLeung622 commented 4 years ago

@grd349 Thanks. Could you take a look at the latest result above?

HinLeung622 commented 4 years ago

@grd349 I am happy to say that I believe I found a sensible solution to the mess.

The last time I talked to you, we were using the three bands individually to be constrained by gaia data. This in general resulted in a problem: Our important observable Bp-Rp is given much more wiggle room than before due to the enlarged obs errors on Bp and Rp. This problem persisted even after I removed the "multiples of obs error" setting and just gave each band a free-varying obs error value. My solution here, is to use 2 bands individually (G and Bp), plus Bp-Rp, where Bp-Rp is calculated directly from the different BCs and is not affected at all by the noise models. I believe this reintroduces the stronger "focus" on getting the Bp-Rp value close to the observed values for the HBM, while the added Bp band helps with lifting helium and age estimates and pressing down the low mass end of the HBM guess so it aligns with the single star sequence.

Enough talking, here are the results With all stars in M67 sample, max pool for all 4 fundamentals (I dropped MLT and Av partial pool, and fixed mod to simplify the model for debugging), 2000 tuning steps: Estimated Age and initial Y are sensible, Rhats are not terribly high, q and delta_mG values are sensible. Note that mag_err[0]=esimated G mag obs err and mag_err[1]=estimated Bp mag obs err, ie the spread on the singles distribution in the noise model. These numbers will go down when I reintroduce partial pooling in MLT and Av. The estimated populations in the last 2 figures do roughly overlap with the data's, so it looks promising.

I will proceed to reintroduce Av and dist mod, then partial pool MLT.

HinLeung622 commented 4 years ago

@grd349 An update on the situation since I have tried everything I have in mind to improve on the sampling results.

I reintroduced Av partial pool and dist mod estimation, in conjunction with an observed parallax value. This resulted in much higher Rhats on basically all of the cluster-wide variables under the same number of tuning steps, a drop back to 0.246 in initial Y estimation. Increasing turning steps did not help the problem.
I pressed ahead and introduced MLT partial pool as well, hoping it will help Av in accounting for the spread along the MS singular stars. This slightly lowered the Rhat values, but initial Y still stuck at 0.246.
To help sampling, I recalled Alex's advise, and removed the 50 stars with generally high Rhat values on their star-specific variables in the previous run from the data. Note that there is no clear concentration of the top 50 (they are roughly equally from the singles, high and low binaries, and of all kinds of masses). This helped the sampler and lowered most Rhat values down to around 1.10. However, initial Y is still 0.246.
In the previous result, I observed the HBM tends to underestimate the stars' luminosities in the high mass end of the data. I thought this could be due to the lack of corresponding binary stars for those at the top left of the CMD, so the sampler mistakenly treated some of the spread in the singles as binaries. This is due to the horizontal cut I used to make the data, when removing anything of a later evolutionary stage than the hook. I then made a vertical cut eliminating the high mass stars so only stars with clear binaries above them remain. This did basically nothing to helping Rhats or raising the initial Y estimation.

So, thoughts?

P.S. I have yet to contact Oli since the current state of the HBM results is probably still not ready for yet another dimension to be introduced into it.

HinLeung622 commented 4 years ago

@grd349 Results for the model we discussed yesterday:

The model: fixed age at 4.0 Gyr, fixed initial feh at 0.05, fixed initial Y at 0.255, free max pool MLT, fixed dist mod, partial pool Av, fixed delta_mG values, fixed widths for the two binary tracks, using the results from your noise model. (I did not fix q since I am not sure how to give a Dirichlet fixed list)

Rhats are not perfect
mean Av is exceptionally high (GAIA DR2 paper gives an estimate of around 0.11, Green 2019 dustmap gives 0.16)
spread Av is quite high (though other papers seem to suggest there could be much higher spread in extinction in OCs)
sampler seems to suggest there is absolutely no stars in the low binary tract, this is likely due to the spread in Av "accounted for" that low binary tract
the distributions plot at the bottom looks quite alike to the one I showed you yesterday

HinLeung622 commented 4 years ago

@grd349 Results for fixed max pool all fundamentals, partial pool Av with fixed mean, fixed dist_mod and fixed noise model: (meaning only mass and spread Av are allowed to vary) This is quite an absolute disaster. I suspect it might have something to do with the very small observational errors on the observed mags we gave the sampler (0.001 for all three bands). Note: the third plot (CMD with both data and HBM guess) uses 0.001 mags to plot its error bars on the blue dots. They are not visible in this plot.

Another thing to note is that the estimated mean spread Av resulted to be 0.07, which is quite far from the peak of the original prior of pm.Lognormal('spread_A_V', T.log(0.01), 0.1).

I will try reruning the HBM but with a higher obs error for the magnitudes, maybe something like 0.01, but I doubt it will help much, if any.

In any case, I will still leave my NNs here and their structures:

The fundamental to Teff/luminosity/delnu NN: https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/blob/master/Hin's_files/b7_best_model.h5 Input cols = [log10 M, log10 star age, initial feh, log10 initial Y, log10 initial alpha_MLT] output cols = [log10 R, log10 (Teff/5000), log10 delnu, star feh]

my manual NN prediction code(This NN has a batch normalization layer right after the input layer, so I thought I will share how I do that in theano code):

    def getWeights(self):
        """
        pass self.weights the NN's weights and calculate number of hidden layers
        """
        weights = self.model.get_weights()
        if 'batch_normalization' in self.model.layers[1].get_config()['name']:
            self.bn_weights = weights[:4]
            self.dense_weights = weights[4:]
        else:
            self.bn_weights = None
            self.dense_weights = weights
        self.no_hidden_dense_layers = len(self.dense_weights)/2-1

    def manualPredict(self, inputs):
        """
        "Manual calculation" of a NN done with theano tensors, for pymc3 to use
        """
        xx = T.transpose(inputs)
        if self.bn_weights is not None:
            xx=T.nnet.bn.batch_normalization_test(xx,*self.bn_weights,epsilon=0.001)
        for i in np.arange(self.no_hidden_dense_layers)*2:
            i=int(i)
            xx=T.nnet.elu(pm.math.dot(xx,self.dense_weights[i])+self.dense_weights[i+1])
        xx=(T.dot(xx,self.dense_weights[-2])+self.dense_weights[-1])
        return xx.T

The Teff/logg/star feh/Av to BCs NN: https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/blob/master/Hin's_files/test6.h5 Input cols = [log10 Teff, logg, star feh, Av] output cols = ['Bessell_U','Bessell_B','Bessell_V','Bessell_R','Bessell_I','Gaia_G_MAW','Gaia_BP_MAWb','Gaia_BP_MAWf','Gaia_RP_MAW'] according to the naming in MIST grid. I have been choosing "Gaia_BP_MAWf" over "Gaia_BP_MAWb" for the BC for BP band so far. Note that since the surface feh values outputted by the first NN are not accurate under this diffusion-turned-on grid, I am using initial feh in place of star feh as the input in for this NN in the HBM. The theano NN prediction code is identical to the previous NN.

Do ask if there is anything not clear about my notations

grd349 commented 4 years ago

Oh boy. Can you try one last run ...

Can you set all the extinction values to 0.11 (fixed and no spread). If it can do that then ... Well who knows.

I'll try and use your Nn's next week to run some checks.

On Sat, 5 Sep 2020, 10:14 HinLeung622, notifications@github.com wrote:

@grd349 https://github.com/grd349 Results for fixed max pool all fundamentals, partial pool Av with fixed mean, fixed dist_mod and fixed noise model: (meaning only mass and spread Av are allowed to vary) [image: image] https://user-images.githubusercontent.com/56083013/92301478-cd7ee600-ef96-11ea-95e7-b765458637a8.png [image: image] https://user-images.githubusercontent.com/56083013/92301480-d079d680-ef96-11ea-983b-894ba9f4e150.png [image: image] https://user-images.githubusercontent.com/56083013/92301532-349c9a80-ef97-11ea-8cb4-2ae91d0e76d0.png [image: image] https://user-images.githubusercontent.com/56083013/92301489-d8d21180-ef96-11ea-97db-5f0207eaaa07.png This is quite an absolute disaster. I suspect it might have something to do with the very small observational errors on the observed mags we gave the sampler (0.001 for all three bands). Note: the third plot (CMD with both data and HBM guess) uses 0.001 mags to plot its error bars on the blue dots. They are not visible in this plot.

Another thing to note is that the estimated mean spread Av resulted to be 0.07, which is quite far from the peak of the original prior of pm.Lognormal('spread_A_V', T.log(0.01), 0.1).

I will try reruning the HBM but with a higher obs error for the magnitudes, maybe something like 0.01, but I doubt it will help much, if any.

In any case, I will still leave my NNs here and their structures:

The fundamental to Teff/luminosity/delnu NN: https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/blob/master/Hin's_files/b7_best_model.h5 Input cols = [log10 M, log10 star age, initial feh, log10 initial Y, log10 initial alpha_MLT] output cols = [log10 R, log10 (Teff/5000), log10 delnu, star feh]

my manual NN prediction code(This NN has a batch normalization layer right after the input layer, so I thought I will share how I do that in theano code):
def getWeights(self):
    """
    pass self.weights the NN's weights and calculate number of hidden layers
    """
    weights = self.model.get_weights()
    if 'batch_normalization' in self.model.layers[1].get_config()['name']:
        self.bn_weights = weights[:4]
        self.dense_weights = weights[4:]
    else:
        self.bn_weights = None
        self.dense_weights = weights
    self.no_hidden_dense_layers = len(self.dense_weights)/2-1

def manualPredict(self, inputs):
    """
    "Manual calculation" of a NN done with theano tensors, for pymc3 to use
    """
    xx = T.transpose(inputs)
    if self.bn_weights is not None:
        xx=T.nnet.bn.batch_normalization_test(xx,*self.bn_weights,epsilon=0.001)
    for i in np.arange(self.no_hidden_dense_layers)*2:
        i=int(i)
        xx=T.nnet.elu(pm.math.dot(xx,self.dense_weights[i])+self.dense_weights[i+1])
    xx=(T.dot(xx,self.dense_weights[-2])+self.dense_weights[-1])
    return xx.T
The Teff/logg/star feh/Av to BCs NN: https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/blob/master/Hin's_files/test6.h5 Input cols = [log10 Teff, logg, star feh, Av] output cols = ['Bessell_U','Bessell_B','Bessell_V','Bessell_R','Bessell_I','Gaia_G_MAW','Gaia_BP_MAWb','Gaia_BP_MAWf','Gaia_RP_MAW'] according to the naming in MIST grid. I have been choosing "Gaia_BP_MAWf" over "Gaia_BP_MAWb" for the BC for BP band so far. Note that since the surface feh values outputted by the first NN are not accurate under this diffusion-turned-on grid, I am using initial feh in place of star feh as the input in for this NN in the HBM. The theano NN prediction code is identical to the previous NN.

Do ask if there is anything not clear about my notations

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/issues/47#issuecomment-687576656, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQZOCPE6HYSHLOXB7HCAP3SEH6QFANCNFSM4QFXMLIA .

HinLeung622 commented 4 years ago

@grd349 Results for fixing Av as well (now the only thing that varies is mass): So the same problem as before occurred: the sampler does not give solutions up to the higher mass regions. I solved this previously by changing one of the bands into Bp-Rp where observed values are given.

grd349 commented 4 years ago

Thanks @HinLeung622

It looks like there is a significant problem with the higher masses here (any chance this is the prior? is it still Beta(10, 10)?)

We need to fix the mass issue (or perhaps mass is just where the problem is obvious) before we can go back to unfixing all the parameters that are now hard coded.

To make things easier (perhaps) it might be worth letting fly on two different runs. 1 with a cut BP-RP < 0.85 and one with 0.95 < BP-RP < 1.05. Does that make sense?

HinLeung622 commented 4 years ago

@grd349 Yes the prior is still in beta(10,10)

So you are suggesting to cut the data into two sections, one medium mass and one high mass? What are you hoping to see from these runs though?

grd349 commented 4 years ago

Forget the cut for a second - we'll try that if the following doesn't work.

Can you set the mass prior to beta(1.1, 1.1) and run again? I think that might be the problem?

grd349 commented 4 years ago

I think that might be the problem?

Rather I think that might be a problem.

HinLeung622 commented 4 years ago

@grd349 Makes sense, I have not thought about the mass prior affecting the sampler's results before. I will redo it with beta(1.1,1.1) then

HinLeung622 commented 4 years ago

@grd349 results with beta(1.1,1.1) prior: So yes, the prior was part of the problem, though the red dots still don't seem to go all the way up to the high mass limit

HinLeung622 commented 4 years ago

@grd349 I tried relaxing the fundamentals' max pool inputs into non-fixed values: An absolute trainwreck

grd349 commented 4 years ago

@HinLeung622 Yeah there is something wrong. I'm not sure what it is. I think the least tested section of the model is the 2nd neural network. Do you think you are able to code up the Gaia color corrections and extinctions using the functions given in the Bossini paper?

HinLeung622 commented 4 years ago

@grd349 Ok I will do that

HinLeung622 commented 4 years ago

@grd349 So the current model looks like this: I put 0 into the Av slot so the output BCs should not be affected by any extinction. I then use the coefficients in D Bossini's paper to convert Av into Ag, Abp and Arp, which are then deducted from the magnitude calculation equation:

    #second NN calculation
    BCs = pm.Deterministic('BCs', t1.manualPredict(T.as_tensor_variable([T.log10(Teff), logg, feh, T.ones(N)*0])))

    #BCs
    BCg = pm.Deterministic('BCg', BCs[5,:])
    BCbp = pm.Deterministic('BCbp', BCs[7,:])
    BCrp = pm.Deterministic('BCrp', BCs[8,:])
    true_Bp_Rp = BCrp-BCbp

    #distance and parallax
    dist_mod_ = 9.726

    #true observables
    true_mG = pm.Deterministic('true_mG', -2.5*T.log10(L)+Mbol-BCg+dist_mod_-getExtinction(A_V, c_G, true_Bp_Rp))
    true_Bp = pm.Deterministic('true_Bp', -2.5*T.log10(L)+Mbol-BCbp+dist_mod_-getExtinction(A_V, c_Bp, true_Bp_Rp))
    true_Rp = pm.Deterministic('true_Rp', -2.5*T.log10(L)+Mbol-BCrp+dist_mod_-getExtinction(A_V, c_Rp, true_Bp_Rp))

Note that I am using extinction-corrected Bp-Rp as inputs to the extinction conversion. I am uncertain is this correct or should I have to correct it for extinction first as well.

Results: I don't see a major difference between the results of this method and the NN incorporated one. Though the way that the HBM guesses seem to bend counter-clockwise when compared to the singles track does have some similarities to the older comparison plot I made between the two extinction calculation methods in issue #46 (figure 1, the blue track (same extinction calculation as what is implemented in this HBM run) looks to be rotated counter-clockwise compared to the red track (the old way of calculating extinction entirely through the MIST NN)). Not sure what to make about this.

HinLeung622 commented 4 years ago

@grd349 This is a pdf of a student's t distribution with sigma = g_mag_err*100, mu=0, nu=1.0: As you can see, even at x100 error and nu=1.0, the distribution is still very narrow. It doesn't look that much different from a normal distribution. (the target width of the tails should be roughly at 0.7 mags since that is high much higher in mags the higher binary track is) I certainly could do a higher multiple, eg. x1000: But x1000 starts getting unrealistic as an observational uncertainty for the Gaia bands. Any thoughts?

grd349 commented 4 years ago

Nu = 0.5 or 0.1?

On Wed, 16 Sep 2020, 16:56 HinLeung622, notifications@github.com wrote:

@grd349 https://github.com/grd349 This is a pdf of a student's t distribution with sigma = g_mag_err*100, mu=0, nu=1.0: [image: image] https://user-images.githubusercontent.com/56083013/93361670-b504aa80-f877-11ea-8dae-5d95e18bb1e7.png As you can see, even at x100 error and nu=1.0, the distribution is still very narrow. It doesn't look that much different from a normal distribution. (the target width of the tails should be roughly at 0.7 mags since that is high much higher in mags the higher binary track is) I certainly could do a higher multiple, eg. x1000: [image: image] https://user-images.githubusercontent.com/56083013/93362043-1a589b80-f878-11ea-814b-7015e2f97bdc.png But x1000 starts getting unrealistic as an observational uncertainty for the Gaia bands. Any thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/issues/47#issuecomment-693500200, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQZOCIA6KOMI536CWXNULLSGDNZ5ANCNFSM4QFXMLIA .

HinLeung622 commented 4 years ago

@grd349 reading about student's T distribution, i believe nu represents the degree of freedom, so in theory it cannot go under 1? Or should I only think about that in a case where there is physical correlation, and this is not such a case.

grd349 commented 4 years ago

Nu > 0 is the only constraint! :)

On Wed, 16 Sep 2020, 17:15 HinLeung622, notifications@github.com wrote:

@grd349 https://github.com/grd349 reading about student's T distribution, i believe nu represents the degree of freedom, so in theory it cannot go under 1? Or should I only think about that in a case where there is physical correlation, and this is not such a case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/issues/47#issuecomment-693511535, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQZOCMVYVXZDGOOYAEUU33SGDQBHANCNFSM4QFXMLIA .

HinLeung622 commented 4 years ago

@grd349 pdfs of student's t distributions with various nu values:

The same pdfs but the y axis is y/max(y) Yeah I still don't think it will help the sampler doing the binaries. Am I doing something wrong here?

HinLeung622 commented 4 years ago

@grd349 By the way, results for the binary-removed data, with everything fixed, max pool, and only mass left to vary: Rhats are basically perfect, sampler finished sampling in a mere 37 minutes (compared to the 4+ hours in the runs with noise models) So this basically confirms that it was the noise model that was creating the problem.

grd349 commented 4 years ago

Why is the red posterior predictions offset from the data?

On Wed, 16 Sep 2020, 17:55 HinLeung622, notifications@github.com wrote:

@grd349 https://github.com/grd349 By the way, results for the binary-removed data, with everything fixed, max pool, and only mass left to vary: [image: image] https://user-images.githubusercontent.com/56083013/93368208-3c561c00-f880-11ea-8592-732d26cb02c1.png Rhats are basically perfect, sampler finished sampling in a mere 37 minutes (compared to the 4+ hours in the runs with noise models) [image: image] https://user-images.githubusercontent.com/56083013/93368364-79baa980-f880-11ea-937d-9e74bcb19097.png So this basically confirms that it was the noise model that was creating the problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/issues/47#issuecomment-693533854, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQZOCP6F62D4Q4AIN7F3ETSGDUZ3ANCNFSM4QFXMLIA .

HinLeung622 commented 4 years ago

@grd349 Probably because of the fixed values I gave it, this is how a constructed cluster with the values i gave it (age = 4.0, feh = 0.05, Yinit = 0.255, MLT = 2.1, dist_mod = 9.72, Av = 0.11)

HinLeung622 commented 4 years ago

@grd349 In any case, I went ahead with nu = 1 and x10 observational mag errors, with fixing and max pooling everything except mass: Rhats look good, and the result looks kind of sensible. Though one would expect some of the red dots to appear closer to the binaries under the student's t distribution, as a break away from the MS.

However, something interesting is to be seen in plotting the distributions:

this is the usual data magnitudes - predicted true magnitudes plot
Interestingly, none of the previously recognizable "3-peak" distribution is visible in any of the g, bp and rp bands. Instead, they all seem to be more or less look like a single peak normal distribution
The negative offset in the delta Rp distribution is probably what accounts for the offset towards the left in the CMD.
The widths of the distributions in delta Bp and Rp are wider than that in G mag, maybe this is because the sampler is treating Bp-Rp as the axis where binaries are modeled for (by setting a large spread in Bp-Rp, instead of the supposed g mag).
The above idea is supported by the delta Bp-Rp distribution plot, where the recognizable "3-peak" distribution is seen.

Maybe this is happening since i gave all three bands student's t distributions, and due to the (about 5x) wider gaia observational error given in the data for Bp and Rp bands, the sampler is guided towards modelling the binaries in the Bp-Rp axis. Thoughts?

HinLeung622 commented 4 years ago

Also, this run took an astronomical amount of time to sample, about 23 hours, which is 40x the time needed without the binary stars and with a basically identical model.

grd349 commented 4 years ago

@HinLeung622 Excellent work! So now we know that the noise model is the problem. There are a number of interesting points that you make!

So let's think about what minimum required performance of the noise model is ...

The noise model should be sensitive to the single stars. It should be insensitive to the binary stars. It should have some way to transitioning from the single stars to the binary stars where Delta_G = 0.4ish. How about the following ...

Let's only consider the binary effect in the g band! What if we student-t the g_mag only. Set nu to be 0.1 or similar. Multiply the Gaia G-band uncertainty by 10 times. If you now let mass and distance (distance mod?) vary and see how you go? Does that sound sensible?

HinLeung622 commented 4 years ago

@grd349 Thanks. That makes sense. Currently I am doing Bp and Rp bands separately. Should I revert back to calculating the true_Bp_Rp before using that to be constrainted by the data's Bp-Rp? (So it will be 1 band and 1 colour) Or, should I keep going with the 3 bands, but use normal distributions for Bp and Rp, and not enlarge their obs errors?

Also, if you look at the two plots of different nu values of student t distributions in the reply 2 days ago, it seems that the distribution only gets more "sharp" when nu is further decreased from nu=1 downwards. I am slightly sceptical of the effects of lowering nu to 0.1.

grd349 commented 4 years ago

@HinLeung622 - What would be the extra overhead for including more bands in our analysis? I thinking particularly the HJK bands from 2MASS. Did you already train your neural network on these bands?

HinLeung622 commented 4 years ago

@grd349 Not sure about the overheads, perhaps it will be linear at worst? I have not trained the NN on the 2MASS bands, but doing so will likely only take one day.

grd349 commented 4 years ago

Currently I am doing Bp and Rp bands separately. Should I revert back to calculating the true_Bp_Rp before using that to be constrainted by the data's Bp-Rp?

My feeling is that we should be treating the bands separately and perhaps even adding more bands but maybe this is too much additional work!

grd349 commented 4 years ago

OK - don't worry about the additional bands - I'll things through about additional bands. Maybe that's for when we have pymc4 working and the run time drops below an hour :)

HinLeung622 commented 4 years ago

@grd349

My feeling is that we should be treating the bands separately and perhaps even adding more bands but maybe this is too much additional work!

I am not concerned about the work per se, but rather the possibility that doing one band one colour would help the sampler to behave and only model for the binaries in g band.

grd349 commented 4 years ago

Your the boss @HinLeung622 - I trust you. You choose :)

HinLeung622 commented 4 years ago

@grd349 Ok, here is what I will do: max pool and fix everything except mass and dist mod student T's distribution with sigma = data obs err x10 on g band only, and normal distributions on Bp and Rp bands, so it will be 3 bands instead of one band one colour, for now. If I don't get promising results from doing 3 bands separately, I will return to one band one colour since that showed slightly more promise in my memory previously.

As for what nu value to use in the student's T distribution, can you take a look at the two plots of different nu values of student t distributions I sent in the reply 2 days ago? it seems that the distribution only gets more "sharp" (suppressed tails, in the second plot) when nu is further decreased from nu=1 downwards. I am slightly sceptical of the effects of lowering nu to 0.1.

Hope all of this is sensible decisions.

grd349 commented 4 years ago

Agree - stick with nu=1as I tried sampling with 0.1 and got very poor results.

On Fri, 18 Sep 2020, 10:31 HinLeung622, notifications@github.com wrote:

@grd349 https://github.com/grd349 Ok, here is what I will do: max pool and fix everything except mass and dist mod student T's distribution with sigma = data obs err x10 on g band only, and normal distributions on Bp and Rp bands, so it will be 3 bands instead of one band one colour, for now. If I don't get promising results from doing 3 bands separately, I will return to one band one colour since that showed slightly more promise in my memory previously.

As for what nu value to use in the student's T distribution, can you take a look at the two plots of different nu values of student t distributions I sent in the reply 2 days ago? it seems that the distribution only gets more "sharp" (suppressed tails, in the second plot) when nu is further decreased from nu=1 downwards. I am slightly sceptical of the effects of lowering nu to 0.1.

Hope all of this is sensible decisions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/issues/47#issuecomment-694762862, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQZOCOCA37FZZ2DSP4TL3DSGMSG7ANCNFSM4QFXMLIA .

HinLeung622 commented 4 years ago

@grd349 Ok, then this is and update to the next run's plan: -max pool and fix everything except mass and dist mod -on g band, student's t distribution, nu=1, sigma = 10*data_obs_g_mag_err -on bp and rp bands, normal distributions with no multiplication on the data errors -using the last mass-only run as start points for the masses in this run -I have shifted the cluster-wide fundamentals a bit so the OC it predicts better agree with the data: feh 0.05->0.1, Av 0.11->0.15. (Note: increasing helium to solar-level only pushes the predicted track left (lower bp-rp) and further away from the data)

HinLeung622 commented 4 years ago

@grd349 Random thought: since we are having troubles to get a student's t distribution to be wide enough to include the binaries' higher g mags, would combining two normal distributions, one with a small sigma and one with a much much wider sigma but with lower weight be a choice? Though now that I think about it, it sounds just like a simplified version of the noise model we had before.

grd349 commented 4 years ago

I think the mixture of two Guassians would give you more flexibility - try both? You have to be careful when you setup the mixture of two Guassians though.

Dr Guy R. Davies PI - ERC CartographY Project Senior Lecturer in Astrophysics School of Physics and Astronomy The University of Birmingham Edgbaston Birmingham B15 2TT

Tel +44 (0) 121 414 4597 G.R.Davies@bham.ac.uk grd349@gmail.com davies@bison.ph.bham.ac.u davies@bison.ph.bham.ac.ukk davies@bison.ph.bham.ac.uk

The contents of this e-mail may be privileged and are confidential. It may not be disclosed, used, or copied in any way by anyone other than the addressee. If received in error please notify the sender then delete it from your system. Should you communicate with the sender by e-mail, you consent to The University of Birmingham monitoring and reading any such correspondence.

On Fri, 18 Sep 2020 at 10:48, HinLeung622 notifications@github.com wrote:

@grd349 https://github.com/grd349 Random thought: since we are having troubles to get a student's t distribution to be wide enough to include the binaries' higher g mags, would combining two normal distributions, one with a small sigma and one with a much much wider sigma but with lower weight be a choice? Though now that I think about it, it sounds just like a simplified version of the noise model we had before.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/issues/47#issuecomment-694770883, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQZOCOKCBVZ5P2JZPHCPYLSGMUGNANCNFSM4QFXMLIA .

HinLeung622 commented 4 years ago

@grd349 After a lot of testing during the weekend, I would say this looks pretty promising:

The student's t distribution based model took more than 48 hours to sample for 2000 tuning steps and ran out of time. I don't think that is a viable path to continue on.

However, I have achieved (limited so far) success with a double overlaping gaussian distribution model, one sharp and centred at 0 to account for the MS and one much flatter to account for the binaries.

The model:

    #all fundamentals max pool and priors fixed, this only shows after the first NN
    #extinction prior
    A_V_mu = 0.15
    A_V_std = pm.Lognormal('spread_A_V', T.log(0.01), 0.1) # Different X & Y but still sensible
    A_V_ = pm.Normal('A_V_', 0, 1, shape=N) # an N(0, 1) dist to be scaled
    A_V = pm.Deterministic('A_V', A_V_ * A_V_std + A_V_mu)

    #second NN calculation
    BCs = pm.Deterministic('BCs', t1.manualPredict(T.as_tensor_variable([T.log10(Teff), logg, feh, A_V])))

    #BCs
    BCg = pm.Deterministic('BCg', BCs[5,:])
    BCbp = pm.Deterministic('BCbp', BCs[7,:])
    BCrp = pm.Deterministic('BCrp', BCs[8,:])

    #distance and parallax
    dist_mod_ = pm.Normal('dist_mod', dist_mod, 1.0)
    true_parallax = pm.Deterministic('true_parallax', 10**(-dist_mod_/5-1)*1000)
    obs_parallax = pm.Normal('obs_parallax', true_parallax, M67_parallax[1], observed=M67_parallax[0])

    #true observables
    true_mG = pm.Deterministic('true_mG', -2.5*T.log10(L)+Mbol-BCg+dist_mod_)
    true_Bp_Rp = pm.Deterministic('true_Bp_Rp', BCrp - BCbp)

    #dealing with multiples
    q = pm.Beta('q', 1.1,1.1, testval=0.8)
    delta_mG = pm.Lognormal('delta_mG', T.log(0.3), 0.4)
    f = pm.Lognormal('f', T.log(10),0.7)
    sigma_multiple = pm.Lognormal('sigma_multiple', T.log(0.3), 0.3)
    dist_singular = pm.Normal.dist(0, np.mean(M67['g_mag_err'])*f)
    dist_multiple = pm.Normal.dist(-delta_mG, sigma_multiple)

    #obs observables
    obs_mG = pm.Mixture('obs_mG', w=[q, 1-q], comp_dists = [dist_singular, dist_multiple], \
                    observed=M67['g_mag'].values - true_mG)
    obs_Bp_Rp = pm.Normal('obs_Bp_Rp', mu=true_Bp_Rp, sigma=M67['Bp_Rp_err'], observed=M67['Bp_Rp'])

q is the fraction of single stars, delta_mG is how much to shift the G mag of the flat distribution (dist_multiple) from centralized at 0, f is the multiple to observational g mag error and sigma_multiple is the width of the flat distribution. This current edition has a partial pool Av but the mean Av is still fixed.

Next step is to let mean Av vary as well, hope it reaches similarly good results.

grd349 commented 4 years ago

@HinLeung622 - Perfect! Great work on the two Gaussians for the noise model. That looks much much better and presumably works much better than the studentsT dist.

I think you have something there - what parameter are you going to free up next? I think age could be next, then metallicity, then helium. Does that sound sensible?

HinLeung622 commented 4 years ago

@grd349 I freed up the mean Av first, it is running right now. Also, intuitively, I thought i would do it in an order that is opposite to yours: MLT -> helium -> feh -> age, intuitively, ordered from smallest to largest effect each parameter has on the observables. Are you thinking of the opposite? (ie from large to small)

grd349 commented 4 years ago

I was thinking the sampler would prefer large to small but I’m not convinced it matters too much. It’s possible that the A_v (differential) will be confused with the single star noise spread which might make things harder - but I’ve been wrong about these things before :)

G

On Mon, 21 Sep 2020 at 12:12, HinLeung622 notifications@github.com wrote:

@grd349 https://github.com/grd349

I freed up the mean Av first, it is running right now. Also, intuitively, I thought i would do it in an order that is opposite to yours: MLT -> helium -> feh -> age, intuitively, ordered from smallest to largest effect each parameter has on the observables. Are you thinking of the opposite? (ie from large to small)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/issues/47#issuecomment-696049146, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQZOCIYANEE2ELQ56XU5IDSG4YJVANCNFSM4QFXMLIA .

-- Dr Guy R. Davies PI - ERC CartographY Project Senior Lecturer in Astrophysics School of Physics and Astronomy The University of Birmingham Edgbaston Birmingham B15 2TT

Tel +44 (0) 121 414 4597 G.R.Davies@bham.ac.uk grd349@gmail.com davies@bison.ph.bham.ac.u davies@bison.ph.bham.ac.ukk davies@bison.ph.bham.ac.uk

The contents of this e-mail may be privileged and are confidential. It may not be disclosed, used, or copied in any way by anyone other than the addressee. If received in error please notify the sender then delete it from your system. Should you communicate with the sender by e-mail, you consent to The University of Birmingham monitoring and reading any such correspondence.

HinLeung622 commented 4 years ago

@grd349 If you mean the observational errors in individual stars when you say "single star noise spread" above, then that is exactly what I believe happened with the previous, more complex, noise model.

I agree that the sampler would have a better time sampling large-changing-parameters, but on the other hand it might be hard to tune the smallest-changing-parameter when everything has been allowed to vary already. I don't know, just my intuitive thought.

grd349 commented 4 years ago

See what works - you never really know with this stuff till you have tried it.

On Mon, 21 Sep 2020 at 12:20, HinLeung622 notifications@github.com wrote:

@grd349 https://github.com/grd349

If you mean the observational errors in individual stars when you say "single star noise spread" above, then that is exactly what I believe happened with the previous, more complex, noise model.

I agree that the sampler would have a better time sampling large-changing-parameters, but on the other hand it might be hard to tune the smallest-changing-parameter when everything has been allowed to vary already. I don't know, just my intuitive thought.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Harry-Westwood/Y4-Project-InterNeuralStellar/issues/47#issuecomment-696052368, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQZOCJSVMZZTD5V2A6DS4TSG4ZGRANCNFSM4QFXMLIA .

-- Dr Guy R. Davies PI - ERC CartographY Project Senior Lecturer in Astrophysics School of Physics and Astronomy The University of Birmingham Edgbaston Birmingham B15 2TT

Tel +44 (0) 121 414 4597 G.R.Davies@bham.ac.uk grd349@gmail.com davies@bison.ph.bham.ac.u davies@bison.ph.bham.ac.ukk davies@bison.ph.bham.ac.uk

The contents of this e-mail may be privileged and are confidential. It may not be disclosed, used, or copied in any way by anyone other than the addressee. If received in error please notify the sender then delete it from your system. Should you communicate with the sender by e-mail, you consent to The University of Birmingham monitoring and reading any such correspondence.

Harry-Westwood / Y4-Project-InterNeuralStellar

Helping HBM sample better (cont.) #47