ACCarnall / bagpipes

Bagpipes is a state of the art code for generating realistic model galaxy spectra and fitting these to spectroscopic and photometric observations. Users should install with pip, not by cloning the repository.
http://bagpipes.readthedocs.io
GNU General Public License v3.0
80 stars 42 forks source link

Issue with h5py interpreting fit_instructions when using the new "R_curve" functionality #50

Closed davidjsetton closed 1 year ago

davidjsetton commented 1 year ago

Hey Adam, hope you are well.

I am trying to get bagpipes running with the new functionality to apply the variable resolution wavelength curve to the models prior to fitting. I am able to successfully get a fit to run through and to write a posterior .h5 file, but after the fit has finished running, I get the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-5aa8ca8a3c09> in <module>
----> 1 fit = pipes.fit(galaxy, fit_instructions, run="spectroscopy")
      2 
      3 fit.fit(verbose=True)

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/fit.py in __init__(self, galaxy, fit_instructions, run, time_calls, n_posterior)
     81             file = h5py.File(self.fname[:-1] + ".h5", "r")
     82 
---> 83             self.posterior = posterior(self.galaxy, run=run,
     84                                        n_samples=n_posterior)
     85 

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/posterior.py in __init__(self, galaxy, run, n_samples)
     51         file = h5py.File(fname, "r")
     52 
---> 53         self.fit_instructions = eval(file.attrs["fit_instructions"])
     54         self.fitted_model = fitted_model(self.galaxy, self.fit_instructions)
     55 

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/posterior.py in <module>

NameError: name 'array' is not defined

It seems that the issue has to do with the block of code where h5py tries to read back in the posterior; it's having an issue dealing with the fact that there is a numpy array in fit_instructions when eval() is run (see below fit instructions dictionary):

{'redshift': 2.627609120329589,
 'delayed': {'age': (0.1, 5.0),
  'tau': (0.01, 15.0),
  'massformed': (9, 12),
  'metallicity': (0.1, 2),
  'metallicity_prior': 'log_10'},
 'dust': {'type': 'CF00',
  'eta': 2.0,
  'Av': (0.0, 2.0),
  'n': (0.3, 2.5),
  'n_prior': 'Gaussian',
  'n_prior_mu': 0.7,
  'n_prior_sigma': 0.3},
 'veldisp': (1.0, 1000.0),
 'veldisp_prior': 'log_10',
 'mlpoly': {'type': 'polynomial_max_like', 'order': 1},
 'noise': {'type': 'white_scaled',
  'scaling': (1.0, 10.0),
  'scaling_prior': 'log_10'},
 'R_curve': array([[ 5000.     ,   118.26343],
        [ 5055.     ,   115.69742],
        [ 5110.     ,   113.22428],
        ...,
        [59890.     ,   545.9617 ],
        [59945.     ,   547.04736],
        [60000.     ,   548.13446]], dtype=float32)}

I was attempting to apply the R_curve to the fit_instructions as in Further Examples 3 (though there it was applied to a galaxy model only, and I think it's very possible I'm doing this wrong here?). Can you advise on what the issue might be and how I might resolve it? Based on the fit results, I wouldn't be surprised if I'm just applying the R_curve incorrectly; if that is the case, maybe the example could be made a bit more clear as to how I should include this in the fits.

Thanks so much in advance and let me know if you need anything else to help with reproducing the error!

Best,

David

davidjsetton commented 1 year ago

Looking through the other issues, this seems to have been reached in issue #43 as well, following the same procedure for including the "R_curve" fit_instruction.

ACCarnall commented 1 year ago

Hi David

Hmm, can you try replacing line 53 in posterior.py with

self.fit_instructions = eval(file.attrs["fit_instructions"].replace("array", "np.array"))

and do the same with the same line (101) in bagpipes/fitting/fit.py

Let me know if that works and I can update the code if so.

Cheers, Adam

davidjsetton commented 1 year ago

Doing so results in the same type of error, but now for reading in the float32 part of things.

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-5aa8ca8a3c09> in <module>
----> 1 fit = pipes.fit(galaxy, fit_instructions, run="spectroscopy")
      2 
      3 fit.fit(verbose=True)

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/fit.py in __init__(self, galaxy, fit_instructions, run, time_calls, n_posterior)
     81             file = h5py.File(self.fname[:-1] + ".h5", "r")
     82 
---> 83             self.posterior = posterior(self.galaxy, run=run,
     84                                        n_samples=n_posterior)
     85 

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/posterior.py in __init__(self, galaxy, run, n_samples)
     51         file = h5py.File(fname, "r")
     52 
---> 53         self.fit_instructions = eval(file.attrs["fit_instructions"].replace("array", "np.array"))
     54         self.fitted_model = fitted_model(self.galaxy, self.fit_instructions)
     55 

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/posterior.py in <module>

NameError: name 'float32' is not defined
ACCarnall commented 1 year ago

Hi David,

Ok, how about if you chuck in another .replace("float", "np.float")?

Cheers, Adam

davidjsetton commented 1 year ago

Hey Adam,

Looks like that cleared it through and it now has some trouble interpreting the array itself:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-5aa8ca8a3c09> in <module>
----> 1 fit = pipes.fit(galaxy, fit_instructions, run="spectroscopy")
      2 
      3 fit.fit(verbose=True)

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/fit.py in __init__(self, galaxy, fit_instructions, run, time_calls, n_posterior)
     81             file = h5py.File(self.fname[:-1] + ".h5", "r")
     82 
---> 83             self.posterior = posterior(self.galaxy, run=run,
     84                                        n_samples=n_posterior)
     85 

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/posterior.py in __init__(self, galaxy, run, n_samples)
     51         file = h5py.File(fname, "r")
     52 
---> 53         self.fit_instructions = eval(file.attrs["fit_instructions"].replace("array", "np.array").replace("float", "np.float"))
     54         self.fitted_model = fitted_model(self.galaxy, self.fit_instructions)
     55 

/opt/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/posterior.py in <module>

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (7,) + inhomogeneous part.

I wonder if the solution might be trying to pass in the R_curve as lists as opposed to arrays and then converting them over outside of the fit_instructions dictionary?

Best,

David

ACCarnall commented 1 year ago

Hi David,

Hmm, I think I understand what's going on here. Conversion to lists might be the best fix, but a quick fix I think would be to include in bagpipes/fit/fit.py just before line 189, which says file.attrs["fit_instructions"] = str(self.fit_instructions), the following line:

np.set_printoptions(threshold=10**6)

You'll unfortunately need to delete the current h5 output file and re-run the code. If you could give this a try and let me know if it works that would be very helpful!

Cheers, Adam

davidjsetton commented 1 year ago

Hey Adam,

That seems to have worked and the fit was able to run through! Thanks so much!

ACCarnall commented 1 year ago

No problem, thanks a lot for doing the testing. I've now incorporated this fix in a new version (1.0.3) of the code.