Open shinybrar opened 3 years ago
This proposal recommends to split the dependencies of ch_util by purpose under extras_require in order to considerably relax base requirements.
I do want to bring up, when considering this issue, the last time I checked extras_require
does not play well with the new pip resolver.
Edit: This was potentially fixed here: https://github.com/pypa/pip/pull/8291
The previous occurrence of a related conversation: https://github.com/chime-experiment/ch_util/pull/8
I was not aware of the discussion in #8 , but if relaxing the dependencies is not ameneable, would it be okay to fork and change ch_utils
under the chimefrb project?
@shinybrar it seems like there are two (and maybe only two?) issues in here:
I'd much rather solve these directly than split up ch_util into sets of optional dependencies. As I discussed in #8, we use a successful installation of ch_util
as somewhat of a guarantee that you have everything you need to interact with CHIME data, and I'd rather not make that conditional.
There are some good avenues for fixing this:
I'm tagging @danielemichilli and @leungcalvin in here as they may have opinions on this one.
Also, I'd really rather avoid a fork if we can.
I think the suggestion would be to change, the current installation of ch_util
with an additional [all]
moniker to maintain the exact same functionality offered today.
>>> pip install "ch_util[all] @ git+https://github.com/chime-experiment/ch_util.git#master"
>>> pip install ch_util[all]
Currently, bitshuffle
is an issue, but additionally installing ch_util
on other scientific clusters, e.g. scinet
has also been problematic due to dependencies, and access to private repositories. chimedb_config
is the core of that issue, and I am not sure how to proceed here, because even if we provide widespread access, someone will misuse or mistakenly not follow the correct security protocols, which could also lead to a new problems.
Currently, there exists a fork of ch_util on chimefrb/ch_util -- If it works for the chime-experiment team, we can maintain that flavour with optional dependencies and equal to the the current master in terms of commits.
I have been tinkering on SciNet and I can confirm that installing ch_util
on SciNet for the purpose of FRB + Outriggers is a singular pain. I have also installed ch_util
on Cedar, on DRAO site, and various server nodes where things have been pretty darn smooth. There's one caveat though: I haven't been using chimedb_config
; on all these other sites I use an interface to get_correlator_inputs
that Shiny and Chitrang have implemented for CHIME/FRB + Outriggers. It works really well on those other sites, and in a world where everybody on Outriggers has their own preferred compute environment, I think it might be less hassle for ch_util to support accessing correlator inputs via two independent interfaces (chimedb_config and the new API) than to designate a contact person to manage access to chimedb_config
. I don't know how else chimedb_config
is used on the CHIME/Cosmology side, but for FRB + Outriggers I believe supporting the API would eliminate the dependency on chimedb_config
for those who need to deploy at all these random sites (for FRB + Outriggers), eliminating the need for Richard's proposed dependency switch/contact person, and reduces the surface area for security vulnerabilities.
I'm gonna punt on bitshuffle, but I don't remember having problems with it on sites besides scinet
.
Re: forking...I don't really care either way and I don't think the Outriggers science will be really affected, as long as we use a branch like gbo
which is refactored for non-CHIME telescopes!
I have been tinkering on SciNet and I can confirm that installing
ch_util
on SciNet for the purpose of FRB + Outriggers is a singular pain
Can you elaborate on that? Is it just bitshuffle that's the problem there? Details on this would be useful as I'd really rather fix the actual problems than put in these workarounds.
I wonder how get_correlator_inputs
works. Is it just because it was introduced after everything was built and stable? It will be a while to capture the connectivity in outrigger sites in the database and correct all the cable swaps etc, and if get_correlator_inputs
is not working with the database, then it would be a pain. unless I am wrongly interpreting what get_...
does.
Generally my attitude is this...
I really would rather avoid a fork if we can. Even the soft fork you're proposing will eventually turn into a hard fork as people start committing changes to it rather than upstream, and then at somepoint we'll change something which breaks your interfacing, and you won't want to fix it, etc. So probably the priority would be to avoid this.
I'd also like to avoid changing the dependencies to optional ones. I appreciate it sounds like a simple change to just say now you just install ch_util[all]
instead, but it will require identifying a large number of places this installation is done (all CI builds, pipeline scripts, personal installations, ansible plays) and figuring out what they need and fixing them all; and it's also going to generate a bunch of support headache in the future when people install ch_util without the set of dependencies that they need and have obscure failures that they need help with.
Because of that I'd really like to explore other options. There's two I can think of:
Identify what the actual pain points are, and make them easier. So far no one has mentioned anything beyond bitshuffle and chimedb_config installation. I believe both of these are soluble. @shinybrar @leungcalvin is there a good reason why fixing these wouldn't be sufficient?
As suggested by @nritsche, split up ch_util into separate packages (i.e. io related code, db related code etc), and have ch_util
depend on them and assemble them into something like the current API. That would mean we would continue to require an installation of ch_util
, but you would be able to just pull in the parts you need.
As indicated above, I think I have a preference for 1, but it needs to be a path that everyone thinks would work.
I think the current pains currently are, but not limited to:
bitshuffle
which, as stated by @jrs65 can be alleviated via the work down by @nritsche ch_util
. when most of the non-cosmology collaborators are using one maybe two functions from repository. This makes the dependency resolution process extremely lengthy.chimedb_config
being a private dependency. This breaks a lot of installations and setup configurations, especially in non-chime, non-sudo environments.I think the proposed solution 1 works, if:
chimedb_config
under a ImportError
exception rather than proliferating access. I fear, if we give chimedb_config
to everyone, it will cause issues, and even then it does not work off-site.bitshuffle
such that we no longer need a lengthy installation process.I understand and relate to @jrs65 preference to limit hard forks.
@mondana
get_correlator_inputs
is a separate backend which actually runs the latest and greatest version ofch_util
. It essentially calls ch_util
functions and makes them available over HTTP API. So any changes made to ch_util for outriggers are will not break it.
Merge the current work done by @nritsche in
bitshuffle
such that we no longer need a lengthy installation process.
I want to make clear that this is not my work but a PR from someone from ESRF, Grenoble (https://github.com/kiyo-masui/bitshuffle/pull/81).
@shinybrar okay great.
I think we can try and thin down the dependencies a bit too. Looking at the requirements.txt
I think we can remove a few things:
I want to make clear that this is not my work but a PR from someone from ESRF, Grenoble (kiyo-masui/bitshuffle#81).
@nritsche can I ask you to take a look at that PR and test it and see if it does what we want (i.e. allow using bitshuffle without install h5py from source)?
@jrs65 @nritsche Has there been any time devoted on implementing the changes on this issue, or is it on the back burner for now?
@shinybrar this is making progress now. We're just about to get the code merged into bitshuffle (https://github.com/kiyo-masui/bitshuffle/pull/81) which means that we won't need to build h5py from source to use bitshuffle. Next stop is to build a binary wheel of bitshuffle, so installing that part of the chain should be more or less instantaneous.
Issues
With
ch_util
supporting a wide myriad of utilities under the hood, downstream packages are currently required to build and install complicated dependencies by default even though they might not be required. An example of this could bempi4py
which is seldom used in developer environment. This proposal recommends to split the dependencies ofch_util
by purpose underextras_require
in order to considerably relax base requirements.Additionally, dependency resolution systems enabled through PEP 517 such as
flit
,poetry
andpipx
try and evaluate requirements forch_util
,it nominally includes packages underextra_installs
. This process errors out, sincechimedb_config
package is closed sourced. As a result, currently its not possible to usech_util
with these modern build systems.Resolutions for:
Private Extra Installs
chimedb_config
will never be made open source, we can either remove it from thesetup.py
and raise an import error, if a function which requires it is called.Dependency Isolation
Example Scenario
If a developer wants access to the
ch_util.tools.Antenna
object, they are required to installbitshuffle
, which requires a hard linked installation ofhdf5
. This additional step adds ~3-5 minutes of compile time.Possible Split
bitshuffle
,h5py
]chimedb @ git+https://github.com/chime-experiment/chimedb.git
,chimedb.data_index @ git+https://github.com/chime-experiment/chimedb_di.git
,chimedb.dataflag @ git+https://github.com/chime-experiment/chimedb_dataflag.git
]mpi4py
]Additionally, I am happy to work on these issues, if the project team is interested in implementing them.