Issue/10/tomo bins - Githubissues

hangqianjun commented 1 year ago

Change Description

[x] My PR includes a link to the issue that I am addressing Resolves #10

Solution Description

In this PR I have added a tomographer module under src/rail/estimation. It contains two generic tomographer classes:

PZTomographer, which takes per-galaxy n(z) from a qp.Ensemble object and output a tabular object with tomographic binning;
CatTomographer, which takes catalogue-like data and output a tabular object with tomographic binning; The second type will be compatible with the classifiers in tomo-challenge, where features in the catalogue are used to assign tomographic bins.

For each of these types, I've added an example classifier in algos/. naiveClassifierSRD is a PZtomographer that uses simple point estimate SRD binning; randomForestClassifier is a CatTomographer which is adapted from TXPipe.

Code Quality

[x] I have read the Contribution Guide
[x] My code follows the code style of this project
[x] My code builds (or compiles) cleanly without any errors or warnings
[x] My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

Bug Fix Checklist

[ ] My fix includes a new test that breaks as a result of the bug (if possible)
[ ] My change includes a breaking change
- [ ] My change includes backwards compatibility and deprecation warnings (if possible)

New Feature Checklist

[x] I have added or updated the docstrings associated with my feature using the NumPy docstring format
[ ] I have updated the tutorial to highlight my new feature (if appropriate)
[x] I have added unit/End-to-End (E2E) test cases to cover my new feature
[ ] My change includes a breaking change
- [ ] My change includes backwards compatibility and deprecation warnings (if possible)

Documentation Change Checklist

[ ] Any updated docstrings use the NumPy docstring format

Build/CI Change Checklist

[ ] If required or optional dependencies have changed (including version numbers), I have updated the README to reflect this
[ ] If this is a new CI setup, I have added the associated badge to the README

Other Change Checklist

[ ] Any new or updated docstrings use the NumPy docstring format.
[ ] I have updated the tutorial to highlight my new feature (if appropriate)
[ ] I have added unit/End-to-End (E2E) test cases to cover any changes
[ ] My change includes a breaking change
- [ ] My change includes backwards compatibility and deprecation warnings (if possible)

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.19% :tada:

Comparison is base (c84ab6a) 95.86% compared to head (727298d) 96.05%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #18 +/- ## ========================================== + Coverage 95.86% 96.05% +0.19% ========================================== Files 29 32 +3 Lines 1643 1724 +81 ========================================== + Hits 1575 1656 +81 Misses 68 68 ``` | [Files Changed](https://app.codecov.io/gh/LSSTDESC/rail_base/pull/18?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=LSSTDESC) | Coverage Δ | | |---|---|---| | [src/rail/estimation/algos/equal\_count.py](https://app.codecov.io/gh/LSSTDESC/rail_base/pull/18?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=LSSTDESC#diff-c3JjL3JhaWwvZXN0aW1hdGlvbi9hbGdvcy9lcXVhbF9jb3VudC5weQ==) | `100.00% <100.00%> (ø)` | | | [src/rail/estimation/algos/uniform\_binning.py](https://app.codecov.io/gh/LSSTDESC/rail_base/pull/18?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=LSSTDESC#diff-c3JjL3JhaWwvZXN0aW1hdGlvbi9hbGdvcy91bmlmb3JtX2Jpbm5pbmcucHk=) | `100.00% <100.00%> (ø)` | | | [src/rail/estimation/classifier.py](https://app.codecov.io/gh/LSSTDESC/rail_base/pull/18?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=LSSTDESC#diff-c3JjL3JhaWwvZXN0aW1hdGlvbi9jbGFzc2lmaWVyLnB5) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

hangqianjun commented 1 year ago

While trying to writing a unit test for the PZTomographer algorithm, I realised that there doesn't seem to be test qp.Ensemble data available in src/rail/examples_data/testdata/. Could we add a small test qp file like test_dc2_training_9816.hdf5?

eacharles commented 1 year ago

Depends on the size. If it is more that a few 10’s of galaxies it would be better to download the data than to include it in the repo.On Jul 4, 2023, at 12:20 PM, hangqianjun @.***> wrote: While trying to writing a unit test for the PZTomographer algorithm, I realised that there doesn't seem to be test qp.Ensemble data available in src/rail/examples_data/testdata/. Could we add a small test qp file like test_dc2_training_9816.hdf5?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

hangqianjun commented 1 year ago

Depends on the size. If it is more that a few 10’s of galaxies it would be better to download the data than to include it in the repo.

Are these data available somewhere already?

eacharles commented 1 year ago

Does src/rail/examples_data/testdata/output_BPZ_lite.fits work? it is already in the repo.

sschmidt23 commented 1 year ago

@hangqianjun I was looking at this PR earlier, I don't think that we want to have input parameters in a .ini file unless there is a strong reason to do so, as that file could change between runs if someone pushes a change and updated values could lead to non-reproducibility on subsequent runs. Having all of the parameters as config params seems like a better way of tracking things in terms of reproducibility. Was there a reason to do this with a .ini file, e.g. maybe TXPipe does something like that?

Also, output_BPZ_lite.fits may not be the best test file, as the mode values for the first 10 galaxies are all either at z<0.2 or z>2.8, and thus the default tomographer value (for SRD binning) for all 10 is -99. We may need to include another small test file here with more appropriate mode values.

Other than that, this looks very nice!

hangqianjun commented 1 year ago

Hi @aimalz, are you happy with the module/algorithm names? Let me know if you have suggestions!

LSSTDESC / rail_base

Issue/10/tomo bins #18

Change Description

Solution Description

Code Quality

Project-Specific Pull Request Checklists

Bug Fix Checklist

New Feature Checklist

Documentation Change Checklist

Build/CI Change Checklist

Other Change Checklist

Codecov Report