Helping HBM sample better

HinLeung622 commented 4 years ago

@grd349 The last issue thread was getting very long so I opt for starting a new one, since we are (kind of) pass that ravel equals to zero error.

Last night I ran 2 HBM runs:

Max pool model with 100 M67 dwarfs taken just under the hook: The red large dots represent a "max pool" generated cluster with masses from 0.8 to 1.2, which means this dataset includes stars with mass > 1.2 Msun. The prior on mass hence runs from 0.8 to 1.5 g mag obs error = x100, tuning steps = 7000, sampling steps = 2000 Sampler not converged, and oddly, the mean value results from this max pool model are somewhat different from the partial pool and the ones we got for the report, especially for Y.
Max pool model with 100 M67 dwarfs taken considerably lower luminosity under the hook: Red dots are of the exact same generated cluster set up, so this sample should be with stars only with mass 0.8-1.2 Msun. The HBM model for this run is identical to the last except for shrinking the mass prior to 0.8 to 1.2 g mag obs error = x100, tuning steps = 7000, sampling steps = 2000 Somehow rather worse sampling results, and very strangely, the posteriors on mass goes up above 1.2. Maybe I made some mistake that I am yet to be able to locate, but I am not sure what happened there.

This test was meant to check if discarding the >1.2Msun dwarfs would result in a much worse estimate in the cluster-wide fundamentals, but the samplers are still having a hard time sampling, so I don't think the test answers the question yet.

I have started an additional run identical to no.2 except for using beta(10,10) on all priors, will report when that gives results.

HinLeung622 commented 4 years ago

@grd349 Results are out, for the max pool + 3 distribution mixture noise model of M67: Good sampling, but it seems the low Y problem is still not solved by this implementation. The lower delta_mG distribution seems to be confused where its posterior should be, judging from its secondary peak at 0.6 mags.

Borrowing your plot: This plot shows the problem clearly, the primary singles population's peak is above what the data suggests it to be.

grd349 commented 4 years ago

OK - disappointing. It looks like it is not quite converged (Rhats are too high). Maybe tighten back up the priors on the noise model to what was in the .ipynb?

Did you run this with the values fixed to the result from the M67 noise model .ipynb?

One thing that is clear is that the initial metallicity is coming in much higher than the literature values. We might be at the point that we should be trying to use a prior on initial metallicity - I'll ask Tanda what he thinks.

HinLeung622 commented 4 years ago

@grd349 What do you mean by "with the values fixed to the results from the M67 noise model .ipynb"? You referring to the testval setting?

For reference about the feh bit, the current prior is a beta(10,10) from -0.2 to 0.2

grd349 commented 4 years ago

https://arxiv.org/pdf/1310.6297.pdf

The cluster has a solar-similar chemical composition with [Fe/H] in the range −0.04 to +0.03 (Hobbs & Thorburn 1991; Tautvaiˇsiene et al. 2000; Yong et al. 2005; Randich et al. 2006; Pace et al. 2008; Pasquini et al. 2008)

How about we say that mean_feh = pm.Normal('mean_feh', 0.0, 0.05) to include this prior info.

grd349 commented 4 years ago

What do you mean by "with the values fixed to the results from the M67 noise model .ipynb"? You referring to the testval setting?

I think we loosened the priors this morning to be less informative. I'm suggesting undoing that loosening. Does that make sense?

HinLeung622 commented 4 years ago

@grd349 Oh it was also referring to the sig values, yes that makes sense. I will rerun with that then. I will also include the tighter feh restriction.

HinLeung622 commented 4 years ago

@grd349 With thighter feh and sig priors: No luck, the results are basically identical. I have switched from a normal distribution to a beta(10,10) distribution from -0.05 to +0.05 for feh and am now doing a rerun.

grd349 commented 4 years ago

Oh - What about fixing the noise model values to the mean of the results from the noise model .ipynb?

This is confusing.

It might also be worth checking (again) what result you get if you remove the outliers by hand.

HinLeung622 commented 4 years ago

@grd349 By "fixing the noise model values to the mean of the results from the noise model .ipynb", you mean reduce the uncertainties given to the sig values to be the same as the model in your noise model example right? This previous run is already using that.

Here is an example of a max pool model run on binary removed data: As you can see, the underestimate in helium and age is still present, albeit on less extreme values. There is a clear shift from symmetry on the dist plot of true_mG - data.

grd349 commented 4 years ago

That is still showing the upturn at the bottom of the main sequence. I wonder if this is something to do with the mixing length?!?

Could you try the same model with a partially pooled mixing length? Just do a simple alpha_MLT drawn from a normal distribution where the mean and the variance is learnt. We want to look at the posterior for evidence for mass dependent shifts in MLT. If we see them we will fit a partially pooled model.

If the partially pooled MLT model is playing up then just remove all pooling on the MLT! That will still probably do the job.

HinLeung622 commented 4 years ago

@grd349 A partially pooled MLT model without the binary stars is now running.

I set the spread in MLT to be pm.Lognormal('spread_MLT',T.log(0.1),0.5)

HinLeung622 commented 4 years ago

@grd349 Results from the max pool + noise model but with feh = beta(10,10) from -0.05 to +0.05 prior: Overall, it seems sampling is better, there is also no double hump in mu_delta_mG2, meaning the two binary distribution are not confused which one they should be. There is less curl up at the low mass end in the CMD, and the data distribution at the last plot is more similar to the HBM's guesses. However, all of this only improved the feh guess, and did nothing to age and Y.

The MLT mixture model without binaries HBM is still running, and it will take additionally 3 hours for me to download it from bluebear afterwards.

HinLeung622 commented 4 years ago

@grd349 So still no luck with the MLT partial pool model on binary removed data: I guess I will make MLT a free parameter on the next run (like mass). Though winscp is not working for me right now: And putty just crashes after logging in. I will restart my computer, hope its not a problem with bluebear

HinLeung622 commented 4 years ago

yep, restarting didn't solve it.

HinLeung622 commented 4 years ago

@grd349 No pooling in MLT, binaries removed: Well, the problem is still here... MLT estimations range all the way from 1.7 to 2.3, the entire range of the training grid.

grd349 commented 4 years ago

OK - Good work @HinLeung622

I'm lost for a reason although at least in the last example the fit looks sort of reasonable.

Could you try one more thing for me. Could you run again but fix Age to 3.5 Gyr, [Fe/H] to 0.05 and Y_i to 0.255. No spread on any of those values. Maybe also fix MLT to 2.1. IF you could run on that and see what you get that would be great. This is basically a test to see if the model can find a solution which would be comparable to literature values.

Just to check - are you sure that the Extinction and bolometric/color corrections etc are working as expected?

And one last question is distance a free parameter with some uncertainty?

HinLeung622 commented 4 years ago

@grd349 So if i were to fix all of age, initial feh, initial helium and MLT, the only thing that the sampler is able to vary is mass. Would that be correct?

Pretty sure the extinction and bolometric calculations are correct, as shown by the OC guess (red smudges on the CMD) overlapping with the data: Example without binaries:

The distance right now is a fixed parameter (done by giving the HBM a fixed distance modulus calculated when collecting data). How do you want me to change this part?

grd349 commented 4 years ago

So the distance would be something like

distance = pm.Normal('distance', 880, 100, testval=880) # use the correct numbers here but keep the prior weak. parallax_obs = pm.Normal('parallax', 1.0/distance, parallax_unc, observed=12345678.987654) # use the correct numbers

Watch out for the units of parallax when taking the inverse to transform parallax to distance.

Does that make sense?

grd349 commented 4 years ago

Bu I agree with you, the plot above makes me think that things like the BC are not wrong. I really don't understand this.

Maybe we need to start again from scratch to make sure there are no logic bugs that are hiding. It's so close yet so far (i.e., sub big-bang helium).

We should also think about a Gaia parallax zero-point offset but we can afford for this to be marginalised out.

HinLeung622 commented 4 years ago

@grd349 I guess we should discuss about these during the meeting later today. I seriously hope the problem would not end up as a stupid careless piece of code on my end.

grd349 commented 4 years ago

@HinLeung622

. I seriously hope the problem would not end up as a stupid careless piece of code on my end.

Actually, this is probably a best case scenario :)

HinLeung622 commented 4 years ago

@grd349 I have done the NN Av vs calculated extinction comparison: It seems there is different between the two methods of calculating extinction, but the original NN Av matches more with the data, so could you double check if my code makes sense?

TAv = np.ones(TN)*Av
TBCs1 = t1.model.predict(np.array([np.log10(TTeff),Tlogg,Tfeh,TAv]).T).T
TBCg1 = TBCs1[5]
TBCbp1 = TBCs1[7]
TBCrp1 = TBCs1[8]
TmG1 = -2.5*np.log10(TL)+Mbol-TBCg1+dist_mod
TBp_Rp1 = TBCrp1 - TBCbp1

TAv = np.ones(TN)*0.0

TBCs2 = t1.model.predict(np.array([np.log10(TTeff),Tlogg,Tfeh,TAv]).T).T

TBCg2 = TBCs2[5]
TBCbp2 = TBCs2[7]
TBCrp2 = TBCs2[8]

TBp_Rp2 = TBCrp2 - TBCbp2

c_m = [0.9761, -0.1704, 0.0086, 0.0011, -0.0438, 0.0013, 0.0099]
R_G = c_m[0]+c_m[1]*TBp_Rp2 + c_m[2]*TBp_Rp2**2 + c_m[3]*TBp_Rp2**3 + c_m[4]*Av + c_m[5]*Av**2 + c_m[6]*TBp_Rp2*Av

fig, ax=plt.subplots(1,1, figsize=(10,10))
#ax.plot(TBp_Rp,TmG,'k.',zorder=1,label='cluster guess')
ax.scatter(TBp_Rp1, TmG1, s=100, zorder=1, c='red', label='NN extinction', alpha=0.5)
ax.scatter(TBp_Rp2, TmG2, s=100, zorder=1, c='blue', label='calculated extinction', alpha=0.5)
ax.errorbar(M67['Bp_Rp'], M67['g_mag'], xerr=M67['Bp_Rp_err'], yerr=M67['g_mag_err']*10, fmt='.', zorder=2, c='black')
ax.scatter(M67['Bp_Rp'], M67['g_mag'], s=15, zorder=3, c='blue', label='members')
ax.set_xlabel(r'$G_{BP}-G_{RP}$ (mag)')
ax.set_ylabel('apparent G (mag)')
ax.set_ylim(ax.get_ylim()[::-1])
ax.legend()
plt.show()

Note: 1 is NN Av, 2 is calculated Av. A concern I have is whether I should be altering the Bp-Rp value of the calculated extinction version as well, since reddenning is not accounted for.

HinLeung622 commented 4 years ago

@grd349 Results for the max pool model without binaries using all three gaia bands individually as observed variables: Seems like this method is the solution? Although the estimated initial helium is still quite close to 0.24, but the age, MLT and feh estimates are back to reasonable levels. Note: The original run of x100 error on both Bp and Rp resulted in a very bad result due to the errors being way too large, allowing for the sample results to completely deviate from the data: So I lowered the multiple to 20. It will probably require further lowering since there is still a slight deviation from the data.

HinLeung622 commented 4 years ago

@grd349 Further lowering the multiple from 20 to 5 on Bp and Rp mags: Now with this, the HBM estimates on the CMD matches up much better (than both the previous trial, and also the ones previously with Bp-Rp instead of the two individual magnitudes) with the cluster data. The estimations on the fundamentals also make slightly more sense, perhaps except initial helium.

The next task is probably to integrate the two magnitudes into the doubles mixture model as well. All three of the magnitudes have to have the additional noise models attached to them, right?

grd349 commented 4 years ago

OK - I'm just not getting why this is all happening. I can't see the pattern here. I suggest we build a simpler model from the ground up and see what we get.

We can do this new model together or we can work separately. If I have try by myself I'll need your best NN for the correct grid.

If we work together we will need to put something ion the diary :)

HinLeung622 commented 4 years ago

@grd349 Working together through a zoom is probably the better method here. Again, I am pretty much free all week, so the sooner you can, the better.

grd349 commented 4 years ago

@HinLeung622

How about Wednesday starting at 08:00 BST? (I'm not sure if that works for you)

G

HinLeung622 commented 4 years ago

@grd349 That works, see you then.

At the meantime, is there any more tests you think I could try with the current setup? (both with and without doubles setups)

grd349 commented 4 years ago

Maybe we should be modelling A_v. (Or rather A_G, A_Grp, A_Gbp). Here is a way we could do it ...

A_V_mu = pm.Normal('A_V_mu', X, Y) # X & Y are 'sensible' values A_V_std = pm.Lognormal('A_V_std', X, Y) # Different X & Y but still sensible A_V_ = pm.Normal('A_V_', 0, 1, shape=N_starts) # an N(0, 1) dist to be scaled A_V = pm.Deterministic('A_V', A_V_ * A_V_std + A_V_mu)

This would pool together the A_V values but not take them from dust maps (we could use dust maps to set the priors).

We could then use the relation in Bossini (actually comes from a Gaia paper) to transfrom to A_G, etc.

Make sense?

HinLeung622 commented 4 years ago

@grd349 So this would account for in cluster extinction? Also, should the Bp-Rp value in the equation in Bossini (Bp-Rp and Av are terms in the equation) be reddening-corrected, or not?

grd349 commented 4 years ago

@HinLeung622

Good question! It should be the Bp_Rp from the model, i.e., the one we are proposing. So I think in your code you need to take the L and then bolometric correct and then estimate Bp_Rp.

Probably in the long run this will not be a hugely significant factor.

HinLeung622 commented 4 years ago

@grd349 Ok, cause as you can see in a reply on this thread 3 days ago, the CMD plots predicted by the original NN method (trained on MIST grid with varying Av as an input) vs the manual extinction calculation method through converting Av into Ag with Bossini's equation (and only give the NN Av=0 inputs) do not match up, not even close. I am trying to figure out which one is wrong. I did use the direct model Bp-Rp output, meaning it is already reddenning corrected (true Bp-Rp without extinction effects) in the Av->Ag equation, so at least it is not that problem.

But this raises a second question. I probably need to adjust the predicted Bp-Rp for reddenning before plotting the CMD, which I did not in that reply's plot. By adjusting for also reddenning (using the Av -> Abp and Av -> Arp equations in Bossini as well): Even more different! Do note that this is quite a complex conversion and it is highly likely I made careless mistakes.

HinLeung622 commented 4 years ago

@grd349 You also mentioned to model dist mod and parallexes. Here is my implementation:

    dist_mod_ = pm.Normal('dist_mod', dist_mod, 0.5) #dist_mod is calculated from Bailer-Jones' distances
    true_parallax = pm.Deterministic('true_parallax', T.ones(N)*10**(-dist_mod_/5-1)*1000) #times by 1000 since Gaia parallexes seems to be in mili arc seconds
    obs_parallax = pm.Normal('obs_parallax', true_parallax, M67['parallax_err'], observed=M67['parallax'])

Does it make sense?

grd349 commented 4 years ago

http://simbad.u-strasbg.fr/simbad/sim-id?Ident=M67

lists the M67 parallax as 1.1325 \pm 0.0011 mas. You should follow the link to the paper and have a read of the details of this measurement and check using it is appropriate but this would make life easier.

If you use all the individual measurements like you are doing above but the values are correlated. It would be easier to use the single cluster-wide value.

HinLeung622 commented 4 years ago

@grd349 Ok. I have given the paper some digging, but I am unable to find a direct description of how they arrived at the particular M67 mean parallax value. The paper is from the gaia collaboration. The relevant section is section A.2 (page 22). The top paragraphs describe (what i think is) their method of determining whether a star is a member of an OC, through a combination of individual parallaxes and proper motion measurements. The table that gives the simbad M67 mean parallax value is table A.4 (named NGC2682 here). The only thing is I cannot find in what way they arrived at the mean parallax values after doing membership analysis. (But maybe since the paper is from the gaia group themselves, it is trust worthy enough in the sense that we are also using gaia data anyways?)

grd349 commented 4 years ago

In section A.2 they discuss the method (not 100% clear) but the main value is given in table A.4 and I believe the uncertainty corresponds to the 1% uncertainty which includes some contributions from systematics. I think this value is very safe to use :)

grd349 commented 4 years ago

dist_mod_ = pm.Normal('dist_mod', dist_mod, 1.0) #dist_mod is calculated from Bailer-Jones' distances true_parallax = pm.Deterministic('true_parallax', 10**(-dist_mod_/5-1)*1000) #times by 1000 obs_parallax = pm.Normal('obs_parallax', true_parallax, M67['parallax_err'], observed=M67['parallax'])

Should do the job. Note that the Gaia DR2 paper quotes a Dist mod if you want to remain consistent but I think with the quoted uncertainty it will be dominated by the observed parallax, so no need to worry here.

HinLeung622 commented 4 years ago

@grd349 So this means I should be using the single value parallax and error from the gaia DR2 paper?

grd349 commented 4 years ago

Yes! :)

HinLeung622 commented 4 years ago

@grd349 Results for max pool model using all three bands individually on binary removed data, now with letting dist_mod and parallax as free variables: Sampler still needs a bit of tuning steps, so there is some disagreements between the chains on age, helium and MLT. However, after switching to the dist_mod value from GAIA DR2, the helium estimate seem to have increased to a reasonable level. The estimates on dist_mod and parallax are very tight (very low stds). So I guess the incorporation of distance into the HBM model is a success.

grd349 commented 4 years ago

@HinLeung622 Excellent! That is looking much more like it!

Still a slight diffference at higher masses but I can cope with that.

Are you thinking of implementing the partially pooled extinction (or have you done that already)?

So summary: This result looks great. Not perfect but it looks like a problem with the undelying models rather than a data problem.

The easiest next step (see differential extinction above) would be to partially pool the MLT. We could then see if MLT looks like it is a function of mass.

Those two results together would be great for a paper! We could also see how things change if we adjust to a partially pooled age but that is just cherry on top stuff.

Again, excellent work and done at lightening speed! Great job :)

HinLeung622 commented 4 years ago

@grd349 Is the partial pool extinction done to account for in-cluster extinction? Would that not undermine the capacity of the data to inform the features of the population in eg. partial pool MLT?

Also, all of this is done on data without binaries, so there is still that to consider, which might work on tomorrow.

HinLeung622 commented 4 years ago

@grd349 New results: max pooling for age, feh, Y, partial pool for MLT, partial pool for Av, with added dist_mod as a free variable, data is binary-removed: Priors I selected:

    MLT_mu = pm.Deterministic('mean_MLT',pm.Beta('g',10,10)*0.6+1.7)
    MLT_sigma = pm.Lognormal('spread_MLT',T.log(0.1),0.5)

    A_V_mu = pm.Normal('mean_A_V', Av, 0.05) #Av is taken from Green's 2019 dustmap = 0.160758
    A_V_std = pm.Lognormal('spread_A_V', T.log(0.03), 0.7)
    A_V_ = pm.Normal('A_V_', 0, 1, shape=N)
    A_V = pm.Deterministic('A_V', A_V_ * A_V_std + A_V_mu)

    dist_mod_ = pm.Normal('dist_mod', dist_mod, 1.0) #dist_mod taken from GAIA DR2 paper = 9.726
    true_parallax = pm.Deterministic('true_parallax', 10**(-dist_mod_/5-1)*1000)
    obs_parallax = pm.Normal('obs_parallax', true_parallax, M67_parallax[1], observed=M67_parallax[0])

Notable result:

Age increased further from previous trials
The mean Av estimate dropped considerably from the prior I inputted, and it is now at a level comparable to that suggested in the GAIA DR2 OC paper (in the paper, table states M67's E(B-V) = 0.037. Taking R_Av to be 2.977, their Av = 0.110149). This result is quite different from what is given in the dustmap
MLT's population features is very poorly resolved. Here is the mass vs MLT plot (x-axis = mass, y-axis = MLT): No notable correlation.

Harry-Westwood / Y4-Project-InterNeuralStellar

Helping HBM sample better #45