LSSTDESC / rail_attic

Redshift Assessment Infrastructure Layers
MIT License
14 stars 9 forks source link

Isolating things into new packages #384

Closed OliviaLynn closed 1 year ago

OliviaLynn commented 1 year ago

If there are things we want to consider isolating into new packages outside the minimal installation for rail, we should list them out here:

eacharles commented 1 year ago

Honestly I think that the direct dependecies for the base rail package ideally would be:

deprecated numpy qp-prob # (note, this is not qp-prob[full], and does not include h5py, panda, pyarrow, and astropy) scipy ceci

And indirectly, via qp-prob tables-io # (as above, this is not tables-io[full], and does not include h5py, panda, pyarrow, and astropy)

This results in about 75 packages. Not that this doesn't actually include the code to read a specific file formats, but since different rail modules read different types of files, I think that is ok.

eacharles commented 1 year ago

This would involve:

  1. a package for astro tools kinda stuff with these addtional deps: "astropy", "healpy", "photerr", "dustmaps", "pz-hyperbolic-temp",

  2. A package for dsps with these additional direct deps: "dsps",

  3. A package for pzflow dependent stuff with these additional direct deps: "pzflow"

  4. A package from som-dependent stuff, with these additional direct deps: "scikit-learn", "minisom", "somoclu",

  5. a package from scikit-learn dependent stuff, with this addtional deps: "scikit-learn",

eacharles commented 1 year ago

Given the above breakdown, here is where the files that are currently in RAIL might end up:

RAIL/src/rail/core/init.py base RAIL/src/rail/core/_version.py base RAIL/src/rail/core/algo_utils.py base RAIL/src/rail/core/common_params.py base RAIL/src/rail/core/data.py base RAIL/src/rail/core/introspection.py base RAIL/src/rail/core/stage.py base RAIL/src/rail/core/utilPhotometry.py ? RAIL/src/rail/core/utilStages.py ? RAIL/src/rail/core/utils.py base RAIL/src/rail/creation/degradation/init.py remove RAIL/src/rail/creation/degradation/grid_selection.py degredation RAIL/src/rail/creation/degradation/lsst_error_model.py RAIL/src/rail/creation/degradation/observing_condition_degrader.py RAIL/src/rail/creation/degradation/quantityCut.py RAIL/src/rail/creation/degradation/spectroscopic_degraders.py RAIL/src/rail/creation/degradation/spectroscopic_selections.py RAIL/src/rail/creation/degrader.py base RAIL/src/rail/creation/engine.py base RAIL/src/rail/creation/engines/dsps_photometry_creator.py dsps RAIL/src/rail/creation/engines/dsps_sed_modeler.py dsps RAIL/src/rail/creation/engines/flowEngine.py pzflow RAIL/src/rail/creation/engines/galaxy_population_components.py dsps RAIL/src/rail/estimation/algos/NZDir.py sklearn RAIL/src/rail/estimation/algos/knnpz.py sklearn RAIL/src/rail/estimation/algos/naiveStack.py base RAIL/src/rail/estimation/algos/pointEstimateHist.py base RAIL/src/rail/estimation/algos/pzflow.py pzflow RAIL/src/rail/estimation/algos/randomPZ.py base RAIL/src/rail/estimation/algos/simpleSOM.py som RAIL/src/rail/estimation/algos/sklearn_nn.py sklearn RAIL/src/rail/estimation/algos/somocluSOM.py som RAIL/src/rail/estimation/algos/trainZ.py base RAIL/src/rail/estimation/algos/varInference.py base RAIL/src/rail/estimation/estimator.py base RAIL/src/rail/estimation/summarizer.py base RAIL/src/rail/evaluation/evaluator.py base RAIL/src/rail/evaluation/metrics/base.py ? RAIL/src/rail/evaluation/metrics/brier.py remove RAIL/src/rail/evaluation/metrics/cdeloss.py ? RAIL/src/rail/evaluation/metrics/pit.py ? RAIL/src/rail/evaluation/metrics/pointestimates.py ? RAIL/src/rail/evaluation/utils.py ? RAIL/src/rail/stages/init.py base RAIL/src/rail/stages/_version.py

aimalz commented 1 year ago

Thanks for this detailed itemization!

Re: 2+3, I think both dsps and pzflow depend on jax without much else, in which case it might make sense to put them in one place if that's what defines the standalones, although it does seem intuitive at this point to have them separated by the whole package they depend on rather than its core dependencies. (Also, we could take this opportunity to consolidate the dsps modeler, creator, and galaxy population components -- wait, don't these live here now? -- into a single module so they get imported from the same place as is true for other stages with the same underlying algo/engine.)

Re: evaluation, is there a reason to move any of the metrics out of base RAIL if none of them have dependencies beyond qp? I guess I have the same question about the core utils, although @delucchi-cmu recently noted that a package that just gathers up commonly used photometric conversions in one place would probably be quite valuable in its own right.

eacharles commented 1 year ago

For the metrics, it’s mainly if they are already in qp or not.   There might be some redundancy there.On May 22, 2023, at 12:39 PM, Alex Malz @.***> wrote: Thanks for this detailed itemization! Re: 2+3, I think both dsps and pzflow depend on jax without much else, in which case it might make sense to put them in one place if that's what defines the standalones, although it does seem intuitive at this point to have them separated by the whole package they depend on rather than its core dependencies. (Also, we could take this opportunity to consolidate the dsps modeler, creator, and galaxy population components into a single module so they get imported from the same place as is true for other stages with the same underlying algo/engine.) Re: evaluation, is there a reason to move any of the metrics out of base RAIL if none of them have dependencies beyond qp? I guess I have the same question about the core utils, although @delucchi-cmu recently noted that a package that just gathers up commonly used photometric conversions in one place would probably be quite valuable in its own right.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

eacharles commented 1 year ago

For the utility stages, I think the another question is if they need to support parallelization and if they should be agnostic as to hdf5 v. parquet input and output.For merging the dsps stuff into a single file, I’d leave that up to the dsps developers.   If you are concerned with people knowing where to import stuff from, the introspection stuff lets you import everything from rail.stages if you want.On May 22, 2023, at 12:39 PM, Alex Malz @.***> wrote: Thanks for this detailed itemization! Re: 2+3, I think both dsps and pzflow depend on jax without much else, in which case it might make sense to put them in one place if that's what defines the standalones, although it does seem intuitive at this point to have them separated by the whole package they depend on rather than its core dependencies. (Also, we could take this opportunity to consolidate the dsps modeler, creator, and galaxy population components into a single module so they get imported from the same place as is true for other stages with the same underlying algo/engine.) Re: evaluation, is there a reason to move any of the metrics out of base RAIL if none of them have dependencies beyond qp? I guess I have the same question about the core utils, although @delucchi-cmu recently noted that a package that just gathers up commonly used photometric conversions in one place would probably be quite valuable in its own right.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

aimalz commented 1 year ago

One corollary to this rearrangement is that we'll need to make a dummy creator (akin to randomPZ or trainZ as estimators) so the RAIL base package still has at least one complete set of those stages.

eacharles commented 1 year ago

The only real reason we need a dummy creator, (also a dummy degrader) is to b/c we need a fully formed stage to run unit tests.

eacharles commented 1 year ago

Done!!!!