LSSTDESC / rail

Top level "umbrella" package for RAIL
MIT License
5 stars 3 forks source link

Move dependencies from rail-pipelines #123

Closed OliviaLynn closed 2 months ago

OliviaLynn commented 5 months ago

We should move the dependencies we get from pz-rail-pipelines[full] (pyproject link) directly into the pyproject.

There's no clear reason to use rail-pipelines as a middle man and this can easily lead to confusion and difficulty maintaining in the future.

eacharles commented 5 months ago

Sorry, could you clarify what you want do here and why?

The dependencies were pretty carefully set up to handle a few different use cases. I want to be sure we aren’t breaking one or more of them by changing them.

On Feb 14, 2024, at 10:59 AM, Olivia R. Lynn @.***> wrote:

We should move the dependencies we get from pz-rail-pipelines[full] (pyproject link https://github.com/LSSTDESC/rail/blob/main/pyproject.toml) directly into the pyproject.

There's no clear reason to use rail-pipelines as a middle man and this can easily lead to confusion and difficulty maintaining in the future.

— Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/rail/issues/123, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRIGIRCPPQY5YQRH6NJZRLYTUCRNAVCNFSM6AAAAABDI3CLBWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZTIOJZGM2TQOA. You are receiving this because you are subscribed to this thread.

eacharles commented 5 months ago

Are you just proposing to replace all three of the "pz-rail-pipelines[full]" with the explicit list of packages?

ztq1996 commented 5 months ago

The idea is to have rail_pipelines as a library for pipeline builders to fork, modify, and contribute their pipeline examples. As rail is currently dependent on rail_pipelines, it makes doing that much harder. I do agree with seeking solution to not break the installation, but I am not an expert on that

eacharles commented 5 months ago

Ok, so in some sense, this is a much about defining the role of rail_pipeline as anything.

Originally we had thought that rail_pipelines would collect code that depended on the various algorithms into pipeline like things, and that it would be very helpful to have some examples of notebooks to run those pipelines in rail, so it natually made sense to have rail_pipelines collect all the dependencies on the algorithms packages, and then just have rail only depend on rail_notebooks.

Since we haven't done much development in rail_pipelines yet, things have sort of evolved that the only things in rail are the demo notebooks for the various algorithms. This is useful for testing, but think in the long run we are really going to want people to be using and sharing pipelines quite a bit too.

eacharles commented 5 months ago

So, it sounds like your idea is to remove the dependency of rail on rail_pipelines, and not try to use rail_pipelines as a collection point for things like pipelined version of demos, but rather just have the demo notebooks live directly in RAIL.

eacharles commented 5 months ago

FWIW, I think could make sense, so long as we are clear about what the role of rail_pipelines is mean to be, think about removing the direct dependency of rail on rail_pipelines.

OliviaLynn commented 5 months ago

I'm imagining the main question being "why go through rail_pipelines," maybe having something like:

# pz-rail

[project.optional-dependencies]
algos = [
    # this section was formerly just algos = [ "pz-rail-pipelines[full]", ]
    "pz-rail-astro-tools",
    "pz-rail-bpz",
    "pz-rail-cmnn",
    ...,
    "qp-prob[full]",
]
nb = [
    "pz-rail[algos]", # this line was formerly "pz-rail-pipelines[full]",
    "jupyter",
    "seaborn",
]
dev = [
    "pz-rail[algos]", # this line was formerly "pz-rail-pipelines[full]",
    ...,
    "pylint",
]
# pz-rail-pipelines

[project.optional-dependencies]
full = [
    # this section was formerly a list of all rail algos
    "pz-rail[algos]",
]

dev = [
    "pz-rail[algos]", # instead of this line, we had the same rail algos listed out here
    "coverage",
    "pytest",
    "pytest-cov", 
    "pre-commit",
    "pylint", 
]

(Note: It's worth acknowledging that pz-rail[algos] will refer to the pyproject in the latest release of pz-rail on pypi, but, the current state refers to the latest release of pz-rail-pipelines on pypi. So, this just means we'd need to make a new release of pz-rail if we wanted to update the list of algos, rather than a new release of pz-rail-pipelines. Not something we do often anyway.)

This gives us the packages listed at the umbrella level, which is a logical place to look for/modify them, and removes the dependency rail has on rail_pipelines.

eacharles commented 5 months ago

Well, the original idea was that rail pipelines was the place to collect all the algos because you would need them anyway for the pipelines.   So the advantage of going through rail_pipelines was avoid the need to list all the packages in two places.We aren’t really using rail pipelines yet, so if we decide to use it in a different way than was originally intended, that is fine.On Feb 15, 2024, at 3:41 PM, Olivia R. Lynn @.***> wrote: I'm imagining the main question being "why go through rail_pipelines," rather than have something like:

pz-rail

[project.optional-dependencies] algos = [

this section was formerly just algos = [ "pz-rail-pipelines[full]", ]

"pz-rail-astro-tools",
"pz-rail-bpz",
"pz-rail-cmnn",
...,
"qp-prob[full]",

] nb = [ "pz-rail[algos]", # this line was formerly "pz-rail-pipelines[full]", "jupyter", "seaborn", ] dev = [ "pz-rail[algos]", # this line was formerly "pz-rail-pipelines[full]", ..., "pylint", ]

pz-rail-pipelines

[project.optional-dependencies] full = [

this section was formerly a list of all rail algos

"pz-rail[algos]",

]

dev = [ "pz-rail[algos]", # instead of this line, we had the same rail algos listed out here "coverage", "pytest", "pytest-cov", "pre-commit", "pylint", ] (Note: It's worth acknowledging that pz-rail[algos] will refer to the pyproject in the latest release of pz-rail on pypi, but, the current state refers to the latest release of pz-rail-pipelines on pypi. So, this just means we'd need to make a new release of pz-rail if we wanted to update the list of algos, rather than a new release of pz-rail-pipelines. Not something we do often anyway.) This gives us the packages listed at the umbrella level, which is a logical place to look for/modify them, and removes the dependency rail has on rail_pipelines.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

sschmidt23 commented 5 months ago

Yeah, in Olivia's sketch above we basically move the big list from rail_pipelines to rail[algos], swapping the role of which repo has the big list (but still only having it in one place by having rail_pipelines install rail[algos] rather than the big list). If we have users who are going to primarily use notebooks, then I can see this proposed change making sense.

eacharles commented 5 months ago

Yep. I think that makes sense. I do think we should be encouraging the development of pipelines and moving to non-notebooks as that will scale better and be more reproducible, but I think people who are getting started will be using notebooks for a while.

ztq1996 commented 2 months ago

This is done in PR #131 , closing issue.