BEAST-Fitting / beast

Bayesian Extinction And Stellar Tool
http://beast.readthedocs.io
23 stars 34 forks source link

physics model prior weights are NaN #514

Closed lea-hagen closed 4 years ago

lea-hagen commented 4 years ago

I just generated a physics model grid using the standard procedure, and all of the values in weight and prior_weight are NaN. I'm not sure if it's something weird I did or a bug that recently showed up, but I'm currently investigating. If someone else has a recently-generated grid and could take a quick look, that would also be helpful.

lea-hagen commented 4 years ago

A clue from when the model is building:

Auto-detected type: hd5
Make Prior Weights
computing the age-mass-metallicity grid weight for Z =  0.00012
/astro/dust_kg3/lhagen/stsci/beast_lea-hagen/beast/physicsmodel/prior_weights_stars.py:178: RuntimeWarning: invalid value encountered in double_scalars
  ] / (mass_bounds[i + 1] - mass_bounds[i])
computing the age-mass-metallicity grid weight for Z =  0.00048
computing the age-mass-metallicity grid weight for Z =  0.00191
computing the age-mass-metallicity grid weight for Z =  0.00762
lea-hagen commented 4 years ago

I have found the problem - when using multiple distances, grid weights and priors in the physicsmodel grid are incorrect.

Summary: For a given combination of M/logt/Z/Av/Rv/fA, if there are N distances, N-2 of the weights will be set to 0. As of #387, weights are now normalized by their average, so all of the weights end up as NaN in the final grid, which breaks things (hence this issue). Prior to that change, it would just remain as N-2 of the weights are 0, which would silently propagate through all the fitting.


Long explanation:

The function add_stellar_priors (in physicsmodel/model_grid.py) calls compute_age_mass_metallicity_weights (in physicsmodel/grid_and_prior_weights.py). This function iterates over metallicity and age to get to each unique list of masses. This is fine for single-distance grids, but for multi-distance grids, the resulting mass list is replicated N_dist times (N_dist = number of distances). The grid/prior weights use bin boundaries made from a sorted list, so many of the bins for each mass have a width of zero. This results in priors of 0 and/or NaN.

I'll do a set of simple examples to illustrate the before/after #387, with/without multiple distances. In each example, I'm showing the bin boundaries for the masses, the grid weights, and the prior weights. Single distances are fine before and after. Multiple distances always have a problem; before it was making zeros (difficult to notice), and after it's breaking things with NaNs.

Pre-#387, one distance Everything is fine.

> from beast.physicsmodel import grid_weights_stars, prior_weights_stars
> masses = np.array([1,2,4,7,8])
> grid_weights_stars.compute_bin_boundaries(masses)
array([ 0.5,  1.5,  3. ,  5.5,  7.5,  8.5])
> grid_weights_stars.compute_mass_grid_weights(masses)
array([ 1. ,  1.5,  2.5,  2. ,  1. ])
> prior_weights_stars.compute_mass_prior_weights(masses, {'name':'kroupa'})
array([ 1.43998243,  0.26966972,  0.10055033,  0.02782858,  0.00841479])

Post-#387, one distance Everything is fine. Only change from above is the normalization.

> from beast.physicsmodel import grid_weights_stars, prior_weights_stars
> masses = np.array([1,2,4,7,8])
> grid_weights_stars.compute_bin_boundaries(masses)
array([ 0.5,  1.5,  3. ,  5.5,  7.5,  8.5])
> grid_weights_stars.compute_mass_grid_weights(masses)
array([ 0.625 ,  0.9375,  1.5625,  1.25  ,  0.625 ])
> prior_weights_stars.compute_mass_prior_weights(masses, {'name':'kroupa'})
 array([ 4.27977361,  0.5343238 ,  0.1195383 ,  0.04135467,  0.02500961])

Pre-#387, 5 distances Incorrect bins, weights have lots of zeros.

> from beast.physicsmodel import grid_weights_stars, prior_weights_stars
> masses = np.array([1,2,4,7,8]*5)
> grid_weights_stars.compute_bin_boundaries(np.sort(masses))
array([ 1. ,  1. ,  1. ,  1. ,  1. ,  1.5,  2. ,  2. ,  2. ,  2. ,  3. ,
        4. ,  4. ,  4. ,  4. ,  5.5,  7. ,  7. ,  7. ,  7. ,  7.5,  8. ,
        8. ,  8. ,  8. ,  8. ])
> grid_weights_stars.compute_mass_grid_weights(masses)
array([ 0. ,  0.5,  1.5,  0. ,  0. ,  
        0. ,  0. ,  0. ,  0. ,  0. ,  
        0.5,  1. ,  0. ,  1.5,  0.5,  
        0. ,  0. ,  0. ,  0. ,  0. ,  
        0. ,  0. ,  1. ,  0.5,  0. ])
> prior_weights_stars.compute_mass_prior_weights(masses, {'name':'kroupa'})
array([ 0.        ,  0.14168113,  0.04300991,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.31514488,  0.12798859,  0.        ,  0.02257026,  0.0045097 ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.05754042,  0.00525831,  0.        ])

Post-#387, 5 distances Incorrect bins. Grid weights have many zeros. Prior weights have many NaNs, and normalizing by the average fills the whole array with NaNs.

> from beast.physicsmodel import grid_weights_stars, prior_weights_stars
> masses = np.array([1,2,4,7,8]*5)
> grid_weights_stars.compute_bin_boundaries(np.sort(masses))
array([ 1. ,  1. ,  1. ,  1. ,  1. ,  1.5,  2. ,  2. ,  2. ,  2. ,  3. ,
        4. ,  4. ,  4. ,  4. ,  5.5,  7. ,  7. ,  7. ,  7. ,  7.5,  8. ,
        8. ,  8. ,  8. ,  8. ])
> grid_weights_stars.compute_mass_grid_weights(masses)
array([ 0.        ,  1.78571429,  5.35714286,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        1.78571429,  3.57142857,  0.        ,  5.35714286,  1.78571429,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  3.57142857,  1.78571429,  0.        ])
> 
> # prior without normalization
> prior_weights_stars.compute_mass_prior_weights(masses, {'name':'kroupa'})
array([        nan,  0.28336225,  0.02867327,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
        0.63028975,  0.12798859,         nan,  0.01504684,  0.00901941,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,  0.05754042,  0.01051662,         nan])
> 
> # prior with normalization
> prior_weights_stars.compute_mass_prior_weights(masses, {'name':'kroupa'})
array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan])

Possible solutions

  1. Change compute_bin_boundaries in physicsmodel/grid_weights_stars.py to only use the unique items in the input table (ie, do np.unique(tab) at the beginning)

  2. Add another for-loop over distance in compute_age_mass_metallicity_weights. Hopefully this is the only place such a change would need to be made. This would presumably also make it easier to include a distance prior (#45).

All else equal, the first option is much easier, especially for a quick fix necessary for production runs. I'm not super familiar with the physicsmodel grid creation, though, so I'm not 100% sure if doing either of these will break something else. @karllark?

karllark commented 4 years ago

Thanks for the great explanation - makes sense!

I think the 1st option is the best. I can look at this in more detail/propose new code today. Unless you have already done this.

lea-hagen commented 4 years ago

I haven't made any changes yet, so you're welcome to work on this.