Functionality to display lower and upper limits of observational data

EvgeniiChaikin commented 6 months ago

This PR adds a new functionality that enables the user to indicate lower and/or upper limits of observational data.

To specify lower/upper limits, the user will need to pass one or two bool arrays to the associate_y() method in the conversion script that converts the data from raw to hdf5 format.

Once the raw data have been processed and saved as an hdf5 file, no further changes are required. That is: everything else in the pipeline remains the same.

For example,

I can modify data/GalaxyHIMassFunction/convertJones2018.py conversion script as

data = np.genfromtxt(input_filename, comments="#")

processed = ObservationalData()

comment = (
    "Jones et al. (2018). Estimated for ALFALFA survey at z=0"
    f"data, h-corrected for SWIFT using Cosmology: {cosmology.name}."
)

citation = "Jones et al. (2018, ALFALFA)"
bibcode = "2018MNRAS.477....2J"
name = "HI mass function from ALFALFA at z=0"
plot_as = "points"
redshift = 0.0
h = cosmology.h

M = 10 ** data[:, 0] * (h / ORIGINAL_H) ** (-2) * unyt.Solar_Mass
Phi = 10 ** data[:, 1] * (h / ORIGINAL_H) ** 3 * unyt.Mpc ** (-3)

# no error in M_HI provided
M_err = np.row_stack([M.value * 0.0] * 2) * M.units
Phi_err = np.abs(
    (10 ** data[:, [2, 3]] * (h / ORIGINAL_H) ** 3 * unyt.Mpc ** (-3)) - Phi[:, None]
).T

up_lims = np.zeros_like(M) ### THIS IS A NEW LINE
lo_lims = np.zeros_like(M) ###  THIS IS A NEW LINE
up_lims[2] = True # or 1 ### THIS IS A NEW LINE
lo_lims[-3:] = True ###  THIS IS A NEW LINE

processed.associate_x(M, scatter=M_err, comoving=True, description="Galaxy HI Mass")
processed.associate_y(Phi, scatter=Phi_err, comoving=True, description="Phi (HIMF)", uplims=up_lims, lolims=lo_lims) ###  THIS IS A MODIFIED LINE
processed.associate_citation(citation, bibcode)
processed.associate_name(name)
processed.associate_comment(comment)
processed.associate_redshift(redshift)
processed.associate_plot_as(plot_as)
processed.associate_cosmology(cosmology)

output_path = f"{output_directory}/{output_filename}"

if os.path.exists(output_path):
    os.remove(output_path)

processed.write(filename=output_path)

then I can use the standard methods to load and plot the processed data as

import matplotlib.pylab as plt
from velociraptor.observations import load_observations
data = load_observations("./Jones2018.hdf5")[0]

fig, ax = plt.subplots(1,1)
data.plot_on_axes(ax)
ax.loglog()
plt.savefig("limits.png")

The resulting figure is

limits

Note that if an observational dataset is shown as 'line' as opposed to 'scatter', then lower/upper limits will still be displayed. E.g.

limits2

robjmcgibbon commented 6 months ago

It seems to me that matplotlib requires an yerr value to be passed to plot lower/upper limits correctly. For example the following code only adds limits for one of the two lines

x = np.arange(10)
y = x**2
z = x > 7
ax.errorbar(x, y, yerr=y/5, lolims=z)
ax.errorbar(x, 2*y, lolims=z)
plt.show()

I think we need to handle the case where someone wants to indicate lower/upper limits, but doesn't pass scatter values when they create the ObservationalData object. Maybe we default to yerr with a constant size in that case? I would probably be in favour of using a constant yerr size even if y_scatter is passed.

EvgeniiChaikin commented 6 months ago

Thanks for the feedback @robjmcgibbon. I agree with your comments!

The updated version always uses the default arrow sizes, regardless of what scatter values are passed to the associate_y() function. It also uses these default values in the case that no scatter is passed to associate_y().

Please let me know if the new version works for you.

EvgeniiChaikin commented 6 months ago

I have pushed an update ensuring that the default scatter values are used to indicate lower/upper limits only if the user does not provide y scatter.

SWIFTSIM / velociraptor-python

Functionality to display lower and upper limits of observational data #108