21cmfast / 21cmFAST

Official repository for 21cmFAST: a code for generating fast simulations of the cosmological 21cm signal
MIT License
58 stars 37 forks source link

feat: floats in hashes now written to given sig figs #287

Closed steven-murray closed 2 years ago

steven-murray commented 2 years ago

Fixes #80

This changes the string representation of OutputStructs and StructWithDefaults objects so that floats appear with a certain number of significant digits (configurable in global configuration). Primarily deals with the caching of redshifts (by default, it uses 4 significant digits, so the redshifts 6.001 and 6.000 would be considered different, but not 6.0001).

A couple of things to think about:

  1. I've applied the float rendering also to input structs, so eg. HII_EFF_FACTOR=30.0 and HII_EFF_FACTOR=30.001 would be considered the same. This could play a little havoc on MCMC routines (or other uses where a large database of randomly sampled parameters is chosen), so that the parameters change but the outputs don't, creating a "step-like" likelihood. To offset this, the user can set the significant digits much higher on a particular run. Also, they probably shouldn't be caching on such a run. But if we think that's not a good idea, I can turn off the significant figure truncation on the input params (but not redshift).
  2. Unfortunately, this change could render a user's existing cache unusable. I've warned about this in the CHANGELOG, but they might not read that.
  3. it strikes me that different parameters might need different precision. Redshifts probably don't need 4 digits, while astro/cosmo params might. I don't have a great way of enforcing this. I could hack it out though, if we think it's important.
codecov[bot] commented 2 years ago

Codecov Report

Merging #287 (a0173c7) into master (633d36c) will increase coverage by 0.01%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
+ Coverage   86.31%   86.33%   +0.01%     
==========================================
  Files          12       12              
  Lines        2777     2780       +3     
==========================================
+ Hits         2397     2400       +3     
  Misses        380      380              
Impacted Files Coverage Δ
src/py21cmfast/_cfg.py 96.87% <100.00%> (+0.04%) :arrow_up:
src/py21cmfast/_utils.py 88.07% <100.00%> (+0.03%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1a78197...a0173c7. Read the comment docs.

BradGreig commented 2 years ago

Hey @steven-murray, is it possible to do this as number of decimal places rather than significant figures? I feel like checking on the number of decimal places works better as it gives you better resolution on your parameters. Otherwise for some large parameters, you may only have one (or less) decimal place.

steven-murray commented 2 years ago

@BradGreig I think there's definitely a way to do it to decimal places, but I thought sig figs would be better. That way you get a given precision on the numbers, rather than having numbers like 10000000000.01. I think it's appropriate not to use any decimal places for such a number.

BradGreig commented 2 years ago

Hey @steven-murray, yeah, I guess I was thinking of the counter example of having a parameter being say 1500 and 1499.5 and these being treated as being the same (depending on the rounding). While arguably they are likely to be pretty similar, you could probably come up with an example (e.g. certain parameter) where this could be a little problematic.

Perhaps a happy medium would be to use six significant figures over four? That way the increased precision helps for databases/MCMC runs. Though, I guess you wouldn't be saving anything in an MCMC where this could cause issues.

steven-murray commented 2 years ago

@BradGreig yeah, that's a good point. Really, the precision required will depend on each parameter and the constraining power of whatever observation is being fit (though, as you say, in MCMC mode, one shouldn't be caching on astro params anyway).

I've made a small update that might be a good medium: now we allow separate configuration options for parameters versus redshifts. By default they get 6 and 4 sig figs respectively. Let me know if you think they are good values, and whether it needs to be more fine-tuned than that.

BradGreig commented 2 years ago

Hey @steven-murray, yeah, I think that is probably a reasonable compromise.

qyx268 commented 2 years ago

just want to shout that this is a great feature!