danielapai / bioverse

A simulation framework to assess the statistical power of future biosignature surveys
MIT License
7 stars 5 forks source link

Measuring observables containing non-numeric entries breaks the workflow #56

Open matiscke opened 3 months ago

matiscke commented 3 months ago

There is an issue with measuring observables that contain nan values. For example, if the new HPIC star catalog is used to generate stars (see #53 ), a cryptic error message appears when a survey is conducted with this generator (e.g. survey.quickrun).

The reason seems to be that the "age" column contains lots of nans and this is currently not well handled in utils.normal().

matiscke commented 3 months ago

57 aims to solve this but needs more work.

nwtuchow commented 3 months ago

I think the problem is due to the fact that only around 33% of objects in the catalog have ages. We are looking to expand our catalog in the future, but currently age is not used in the calculations that we are interested in performing, so in the first release of the catalog we only obtained ages when they were derived in the same calculation as the stellar mass.

We have a function argument to read_HPIC of required_props which is an array of properties required for every star. The function will exclude objects that don't have measurements in those stellar properties. I wouldn't recommend using it with age though, as it would heavily bias the sample by excluding the 66% of stars without ages.

I believe the error you are encountering is not from the read_HPIC function but is from another function called by the generator, which uses ages computed in earlier steps of the generator. For the purposes of the albedo Seff project, nan values of ages are fine for us and the read_HPIC function doesn't cause problems. If you were testing a hypothesis that requires ages for every object, we could potentially add an argument to the read_HPIC_function to generate random ages from a distribution, or fill in missing values.

nwtuchow commented 3 months ago

actually the error may come from the measurement object, which can't handle true values for age that are nan

matiscke commented 3 months ago

Thank you, Noah.

I agree that stellar age should not be a required property.

The issue is not with your function; it just didn't show up with our previous generator which assigns random ages. As you wrote, it seems the problem shows up when a simulated survey takes measurements of an observable that has nan values.