chime-experiment / ch_util

CHIME utilities
https://chime-experiment.github.io/ch_util
MIT License
2 stars 3 forks source link

Relaxed Dependencies #13

Open shinybrar opened 3 years ago

shinybrar commented 3 years ago

Issues

With ch_util supporting a wide myriad of utilities under the hood, downstream packages are currently required to build and install complicated dependencies by default even though they might not be required. An example of this could be mpi4py which is seldom used in developer environment. This proposal recommends to split the dependencies of ch_util by purpose under extras_require in order to considerably relax base requirements.

Additionally, dependency resolution systems enabled through PEP 517 such as flit, poetry and pipx try and evaluate requirements for ch_util,it nominally includes packages under extra_installs. This process errors out, since chimedb_config package is closed sourced. As a result, currently its not possible to use ch_util with these modern build systems.

Resolutions for:

Private Extra Installs

  1. Since chimedb_config will never be made open source, we can either remove it from the setup.py and raise an import error, if a function which requires it is called.
  2. We can give selective access to the CHIME/FRB personnel as a workaround.

Dependency Isolation

Example Scenario

If a developer wants access to the ch_util.tools.Antenna object, they are required to install bitshuffle, which requires a hard linked installation of hdf5. This additional step adds ~3-5 minutes of compile time.

Possible Split

Extra Dependencies
chime_io [bitshuffle, h5py]
chime_db [chimedb @ git+https://github.com/chime-experiment/chimedb.git, chimedb.data_index @ git+https://github.com/chime-experiment/chimedb_di.git, chimedb.dataflag @ git+https://github.com/chime-experiment/chimedb_dataflag.git]
chime_mpi [mpi4py]
all [*]

Additionally, I am happy to work on these issues, if the project team is interested in implementing them.

anjakefala commented 3 years ago

This proposal recommends to split the dependencies of ch_util by purpose under extras_require in order to considerably relax base requirements.

I do want to bring up, when considering this issue, the last time I checked extras_require does not play well with the new pip resolver.

Edit: This was potentially fixed here: https://github.com/pypa/pip/pull/8291

The previous occurrence of a related conversation: https://github.com/chime-experiment/ch_util/pull/8

shinybrar commented 3 years ago

I was not aware of the discussion in #8 , but if relaxing the dependencies is not ameneable, would it be okay to fork and change ch_utils under the chimefrb project?

jrs65 commented 3 years ago

@shinybrar it seems like there are two (and maybe only two?) issues in here:

  1. bitshuffle is hard to install because it requires an h5py source build (although I think @nritsche thinks if you don't actually care about it working you don't need an h5py source build, a wheel is fine)
  2. chimedb_config is an issue as many FRB folks don't have access to it

I'd much rather solve these directly than split up ch_util into sets of optional dependencies. As I discussed in #8, we use a successful installation of ch_util as somewhat of a guarantee that you have everything you need to interact with CHIME data, and I'd rather not make that conditional.

There are some good avenues for fixing this:

I'm tagging @danielemichilli and @leungcalvin in here as they may have opinions on this one.

Also, I'd really rather avoid a fork if we can.

shinybrar commented 3 years ago

I think the suggestion would be to change, the current installation of ch_util with an additional [all] moniker to maintain the exact same functionality offered today.

>>> pip install "ch_util[all] @ git+https://github.com/chime-experiment/ch_util.git#master"

>>> pip install ch_util[all]

Currently, bitshuffle is an issue, but additionally installing ch_util on other scientific clusters, e.g. scinet has also been problematic due to dependencies, and access to private repositories. chimedb_config is the core of that issue, and I am not sure how to proceed here, because even if we provide widespread access, someone will misuse or mistakenly not follow the correct security protocols, which could also lead to a new problems.

Currently, there exists a fork of ch_util on chimefrb/ch_util -- If it works for the chime-experiment team, we can maintain that flavour with optional dependencies and equal to the the current master in terms of commits.

leungcalvin commented 3 years ago

I have been tinkering on SciNet and I can confirm that installing ch_util on SciNet for the purpose of FRB + Outriggers is a singular pain. I have also installed ch_util on Cedar, on DRAO site, and various server nodes where things have been pretty darn smooth. There's one caveat though: I haven't been using chimedb_config; on all these other sites I use an interface to get_correlator_inputs that Shiny and Chitrang have implemented for CHIME/FRB + Outriggers. It works really well on those other sites, and in a world where everybody on Outriggers has their own preferred compute environment, I think it might be less hassle for ch_util to support accessing correlator inputs via two independent interfaces (chimedb_config and the new API) than to designate a contact person to manage access to chimedb_config. I don't know how else chimedb_config is used on the CHIME/Cosmology side, but for FRB + Outriggers I believe supporting the API would eliminate the dependency on chimedb_config for those who need to deploy at all these random sites (for FRB + Outriggers), eliminating the need for Richard's proposed dependency switch/contact person, and reduces the surface area for security vulnerabilities.

I'm gonna punt on bitshuffle, but I don't remember having problems with it on sites besides scinet.

Re: forking...I don't really care either way and I don't think the Outriggers science will be really affected, as long as we use a branch like gbo which is refactored for non-CHIME telescopes!

jrs65 commented 3 years ago

I have been tinkering on SciNet and I can confirm that installing ch_util on SciNet for the purpose of FRB + Outriggers is a singular pain

Can you elaborate on that? Is it just bitshuffle that's the problem there? Details on this would be useful as I'd really rather fix the actual problems than put in these workarounds.

mondana commented 3 years ago

I wonder how get_correlator_inputs works. Is it just because it was introduced after everything was built and stable? It will be a while to capture the connectivity in outrigger sites in the database and correct all the cable swaps etc, and if get_correlator_inputs is not working with the database, then it would be a pain. unless I am wrongly interpreting what get_... does.

jrs65 commented 3 years ago

Generally my attitude is this...

I really would rather avoid a fork if we can. Even the soft fork you're proposing will eventually turn into a hard fork as people start committing changes to it rather than upstream, and then at somepoint we'll change something which breaks your interfacing, and you won't want to fix it, etc. So probably the priority would be to avoid this.

I'd also like to avoid changing the dependencies to optional ones. I appreciate it sounds like a simple change to just say now you just install ch_util[all] instead, but it will require identifying a large number of places this installation is done (all CI builds, pipeline scripts, personal installations, ansible plays) and figuring out what they need and fixing them all; and it's also going to generate a bunch of support headache in the future when people install ch_util without the set of dependencies that they need and have obscure failures that they need help with.

Because of that I'd really like to explore other options. There's two I can think of:

  1. Identify what the actual pain points are, and make them easier. So far no one has mentioned anything beyond bitshuffle and chimedb_config installation. I believe both of these are soluble. @shinybrar @leungcalvin is there a good reason why fixing these wouldn't be sufficient?

  2. As suggested by @nritsche, split up ch_util into separate packages (i.e. io related code, db related code etc), and have ch_util depend on them and assemble them into something like the current API. That would mean we would continue to require an installation of ch_util, but you would be able to just pull in the parts you need.

As indicated above, I think I have a preference for 1, but it needs to be a path that everyone thinks would work.

shinybrar commented 3 years ago

I think the current pains currently are, but not limited to:

  1. Extremely large build times, for bitshuffle which, as stated by @jrs65 can be alleviated via the work down by @nritsche
  2. Large number of dependencies required by ch_util. when most of the non-cosmology collaborators are using one maybe two functions from repository. This makes the dependency resolution process extremely lengthy.
  3. chimedb_config being a private dependency. This breaks a lot of installations and setup configurations, especially in non-chime, non-sudo environments.

I think the proposed solution 1 works, if:

  1. We hide chimedb_config under a ImportError exception rather than proliferating access. I fear, if we give chimedb_config to everyone, it will cause issues, and even then it does not work off-site.
  2. Merge the current work done by @nritsche in bitshuffle such that we no longer need a lengthy installation process.

I understand and relate to @jrs65 preference to limit hard forks.

@mondana
get_correlator_inputs is a separate backend which actually runs the latest and greatest version ofch_util. It essentially calls ch_util functions and makes them available over HTTP API. So any changes made to ch_util for outriggers are will not break it.

nritsche commented 3 years ago

Merge the current work done by @nritsche in bitshuffle such that we no longer need a lengthy installation process.

I want to make clear that this is not my work but a PR from someone from ESRF, Grenoble (https://github.com/kiyo-masui/bitshuffle/pull/81).

jrs65 commented 3 years ago

@shinybrar okay great.

I think we can try and thin down the dependencies a bit too. Looking at the requirements.txt I think we can remove a few things:

jrs65 commented 3 years ago

I want to make clear that this is not my work but a PR from someone from ESRF, Grenoble (kiyo-masui/bitshuffle#81).

@nritsche can I ask you to take a look at that PR and test it and see if it does what we want (i.e. allow using bitshuffle without install h5py from source)?

shinybrar commented 3 years ago

@jrs65 @nritsche Has there been any time devoted on implementing the changes on this issue, or is it on the back burner for now?

jrs65 commented 3 years ago

@shinybrar this is making progress now. We're just about to get the code merged into bitshuffle (https://github.com/kiyo-masui/bitshuffle/pull/81) which means that we won't need to build h5py from source to use bitshuffle. Next stop is to build a binary wheel of bitshuffle, so installing that part of the chain should be more or less instantaneous.