What to do with "pure" pypi packages?

ChrisBarker-NOAA commented 8 years ago

There are a lot of python pacakges that "jsut work" with plain old:

pip install the_package

These are pure python packages with non-complex dependencies.

So some folks use a mixture of conda and pip to install stuff, but this gets ugly with the dependency resolution, etc.

I've dealt with this so far by making conda packages for these, but there are a LOT of them -- and as this is an easy lift, it would be do-able to automate it all. I've always thought that Anaconda-org should have a PyPi bridge -- someone looks for a package, it's not there, it looks for a pypi package and builds a conda pacakge on the fly and away we go!

But that wold require Continuum to do it, and maybe would be too much magic.

But maybe we could have a set of conda packages that are auto-built from PyPi (conda skeleton mostly works) and then have an automated system that goes through and looks to see if there are newer versions of any of them, and auto-update those. So in theory, all we'd need to do by hand was keep a list of packages to monitor (probably keep up with whether it had been added to the default channel).

I started down this track before I discovered obvious-ci -- running conda skeleton, and building the package on the fly. Then I decided that it was easier to simply maintain by hand the half a dozen packages I needed. But it would be nice to cover a much larger range of packages....

Thoughts?

ocefpaf commented 8 years ago

gets ugly with the dependency resolution, etc.

I am not so sure this is as ugly as a before [citation needed]. The integration with pip has improved a lot. (Even Linux distros are now allowing a mixture of PyPI and system packages!)

Maybe we are just missing something like:

conda install --from pypi <some package>

jankatins commented 8 years ago

IMO: all dependencies of a included conda package should be packaged as conda packages.

Whatever the user wants to install on top of that is available via conda install pip; pip install <whatever>.

bkreider commented 8 years ago

But maybe we could have a set of conda packages that are auto-built from PyPi (conda skeleton mostly works) and then have an automated system that goes through and looks to see if there are newer versions of any of them, and auto-update those. So in theory, all we'd need to do by hand was keep a list of packages to monitor (probably keep up with whether it had been added to the default channel).

Continuum has some old/stale code around this. Someone did this as a side project and built a very large percentage of packages on PyPI. I don't know what happened to that experiment or if it is current or clean enough for it to be useful to you.

msarahan commented 8 years ago

Thanks @bkreider, I have inherited that code. It is at https://github.com/ContinuumIO/pypi-conda-builds

It's not usable right out of the box, but I'm working on it when I have time.

ChrisBarker-NOAA commented 8 years ago

Cool, thanks!

Personally, as was discussed on another thread somewhere.... I find that recipes all too often require some tweaking. So I think a two-part sytem would be best:

1) build the recipes from PyPi

2) update recipes that were built from PyPi -- essentially go into each recipe, and check if new versions are available on PYPi, and if so, update the version numbers, etc, but keep the existing recipe, with any tweaks it had already.

Now to find the time to work on such a thing...

jakirkham commented 8 years ago

Not sure if I brought this to your attention, @takluyver, but I'm doing it now if I haven't already. Otherwise sorry for the noise.

takluyver commented 8 years ago

Thanks @jakirkham

I've recently created an experimental tool to turn wheels of pure Python packages into conda packages: wheel2conda (inventive name, I know ;-). As more and more Python packages are made available as wheels, I think this could be a very quick, dependable way to automate making conda packages for the vast majority of packages that are straightforward. It's new and experimental just now, but please do kick the tyres.

I'll be joining the video meeting on Friday to talk more about this.

jakirkham commented 8 years ago

This again is from another thread, but I wanted to promote it to the relevant issue. I have extended my thoughts to be a little bit clearer about the problems that I see and how we can address them.

The Problems

The idea here is that it is too tricky to determine if something is pure Python (particularly in any automatic way). For that matter, it is too tricky to determine when a pure Python package becomes less pure (includes some C code for instance). While there is metadata that can be used to specify this information, it happens to often that this metadata is simply inaccurate. We care about this as we want to control the environment that compiled code is build and want to avoid shipping unverified binary bits because it ruins the quality of the ecosystem and could make users vulnerable to problems. So, however we automate this, we need to keep this problem in mind.

Somewhat orthogonal, but there are a number of other use cases R, Perl, Lua, GitHub repos, etc. where this same functionality would be nice ( see this issue https://github.com/conda-forge/conda-forge.github.io/issues/51 ). As more languages enter the scene (as I am sure they will), it would be nice to continue extending this functionality to them. Having something that gets too specialized for the PyPI case misses the fact that conda is becoming more general purpose than its Python beginnings would lead one to believe. So, this is another problem to keep in mind.

Returning to the main point, it would be nice to have a solution that is not too complex (or different) and leverages the full bandwidth of our CIs. The few cases where we have O(N) solutions are the ones that are more likely to break on one package and fail to complete the rest. While we can go back and fix this case-by-case, there ends up being a fair bit of pressure on core developers to get this working fast and code quality often suffers. However, there are cases where the problem is not embarrassingly parallel and using locks with CIs sounds really too complex. Thus it ends up being an ok compromise in some cases to have an O(N) solution (e.g. making the package listing webpage, updating the feedstocks repo, etc.). However, I don't think we want something so crucial as the creation of PyPI packages to be locked into this. Otherwise it feels like we are unintentionally drifting back into the one massive repo model that @pelson and @ocefpaf struggled with and fought so hard to free us from.

The Proposal

To me, the simplest way forward that addresses all of these concerns is to have normal feedstocks for PyPI packages, but have updates for them managed in an automatic fashion. To allow for the automation, we can have a special maintainer like pypi-conda-forge-maintainer or similar that we add to the maintainers list. This allows us to benefit from the work already done and easily segue into the automated system. When the feedstock is processed for team management, this will register the feedstock for automatic maintenance there. If anything breaks down, we can always handle things with manual maintenance. We can also remove the automatic maintainer if that strategy becomes unfeasible for whatever reason. While in automatic maintenance, packages can have version info gathered from PyPI and update PRs submitted. If this part breaks down, it doesn't preclude us from getting the updates via other mechanisms (making the PR ourselves). Things like conda skeleton pypi or a jinja template for setuptools would help make this automatic recipe generation straightforward. This proposal keeps things simple without adding lots of complexity to our ecosystem.

Addressing the Problems

By using this proposal, we no longer have to worry about when a package has C or other compiled code added to it. Those packages can still be maintained automatically in this system. However, if we chose not to maintain them this way we don't have to. The proposal here doesn't do anything special for PyPI (other than whatever version scraping script is used). This way, we no longer need to worry about when a Python package adds some C code. It can still be automatically maintained just the same. :smile: If we ever run into problems with a feedstock, we can do manual maintenance at any time. We can also easily extend this model to other languages. All that really changes is that we add new scraping scripts and we may discover there is much in common between them that can be reused. By keeping feedstocks for automatic maintenance, we can benefit from our existing work, fix problems manually as they arise (either as a one off or disabling automated maintenance), keep the full bandwidth of our CIs for building packages (so we can scale appropriately), avoid catastrophic breakdowns of the automated architecture from affecting the builds of individual packages (reducing stress level for everyone involved :smile:), etc.

In short, by using our existing infrastructure with a few minor additions, we can already benefit greatly and get all the things that we want without so many concerns.

takluyver commented 8 years ago

it is too tricky to determine if something is pure Python

This is a key advantage to using wheels. The wheel tag embeds the Python ABI and platform it is for, so if it looks like py3-none-any, you know it's a pure Python package. This is part of the filename, so you can tell even before downloading it.

What you're saying makes sense, but it feels a little bit like an old joke about mathematicians:

Q: Describe how you would make tea if the kettle is hanging on a hook.
A: Take the kettle from the hook, put it in the sink, fill it with water...
Q: Now describe how you'd make tea if the kettle is in the sink.
A: Take the kettle from the sink and hang it on the hook. This reduces it to a problem already solved.

It feels like massive overkill to maintain a separate 'feedstock' repo for every trivial PyPI package, a bot that updates them all, and a build infrastructure spanning three separate CI services. For these cases, it's ultimately just unpacking one archive, moving some files around, and repacking into another archive with a bit of metadata.

ocefpaf commented 8 years ago

This reduces it to a problem already solved.

@takluyver I am with you there if we could just conda install --from PyPI <pure-python-package> because that is leaving the kettle in the sink. However, repackaging, no matter how we do it, is taking the kettle from the sink.

What @jakirkham proposes is a re-packaging that does make sense with our current tools. Mostly because it allows us to control how we package and write the metadata.

Take the xarray package as an example. One can pip install it, but the xarray package in conda-forge will bring all the optional dependencies that most Earth scientists require. In that case the re-packaging has a purpose and we are not only putting the kettle back in the hook.

takluyver commented 8 years ago

I think my very loose analogy may have come across as too detailed. I was just saying that it seems like a massively complex and roundabout way to achieve something that should be quite simple.

I am with you there if we could just conda install --from PyPI because that is leaving the kettle in the sink. However, repackaging, no matter how we do it, is taking the kettle from the sink.

Well, conda install --from PyPI would have to use some kind of repackaging to turn the Python package into something conda understands. conda2wheel is fast enough that one could even envisage a server that converts packages on demand - i.e. when something tries to download the conda package, it grabs the wheel from PyPI, converts it and serves (+caches) the result. I'm not suggesting that's the way to go, but there are a lot more options when the conversion takes <1s than when it takes several minutes.

jankatins commented 8 years ago

one could even envisage a server that converts packages on demand - i.e. when something tries to download the conda package, it grabs the wheel from PyPI, converts it and serves (+caches) the result

But why? Just so that users don't need to touch pip as an alternative? That would only work with the wheel2conda-as-a-service. Right now, this simple transfers the following

conda install not-a-conda-pkg
# error
wheel2conda not-a-conda-pkg # assuming it is directly installed afterwards

into

conda install not-a-conda-pkg
# error
conda install pip # if it is not already installed
pip install not-a-conda-pkg

IMO the big advantage of a "real" package is that it is (or at least can be) integrated into the system of other conda packages, starting that installing it installs all the right dependencies so that it works when it installed. This is similar to debian and the debian packaging policy (which is IMO the real USP of debian...). A wheel2conda program would be akin to alien, which can turn rpms into debs, which works but will wreck havoc with packaging dependencies if other packages (should) depend on this package and the converted packages does not get the right dependencies or exports the wrong "interface".

Example: There is a conda package for a compiled library package xyz on which a lot of other packages depend. And there is a PyPI package of the name xyz (which would get manually packaged as pyxyz or whatever). But now a user uses wheel2conda xyz and gets a package named xyz. Now the user installs a conda package which depends on the library xyz conda package: the new package gets installed but it doesn't work because the expected xyz interface is not there.

If one has to build a wheel2conda package, a pypi2conda-recipe is more or less the same for the trivial case of a pure python package. Both need a database of pypi names to conda names (to get dependencies right even in cases where these two names are not the same!) and the rest is just parsing PyPI data... So:

To me, the simplest way forward that addresses all of these concerns is to have normal feedstocks for PyPI packages, but have updates for them managed in an automatic fashion. To allow for the automation, we can have a special maintainer like pypi-conda-forge-maintainer or similar that we add to the maintainers list.

+1

takluyver commented 8 years ago

Right, either way we need some way to map PyPI dependency names to conda dependency names. That's orthogonal to the question I'm talking about, which is how you turn packages on PyPI into conda packages.

jakirkham commented 8 years ago

As far as paring PyPI data, @183amir has really done a nice job here with some scripts. It is really designed to automate this exact sort of thing. ) he has written. It would be nice to find a proper home for them here and get them cleaned up with tests and all that fun stuff (with his permission of course 😉).

183amir commented 8 years ago

I was thinking of putting it in conda-smithy because that one creates feedstocks.

jakirkham commented 8 years ago

That could be a possibility. Thoughts, @pelson?

kynan commented 8 years ago

Part of the initial problem posed by @ChrisBarker-NOAA as I understand it is having to also package any PyPI dependencies of a conda recipe. Wouldn't this be helped if conda recipes would allow specifying dependencies from PyPI? See also conda/conda-build#548.

ChrisBarker-NOAA commented 8 years ago

Now that this has been revived -- where is conda at with platform-independent packages?

That would make it easier to package up pure-python packages.

BTW -- for pure python, I don't see the advantage of making a conda package from a wheel -- it's just as easy to make one from source. And if it's not pure-python (which is where wheels shine), then the wheel is all too likely to be incompatible with the rest of your conda system.

nicoddemus commented 7 years ago

For reference, in 4.3 there's now support for Generic- and Python-Type Noarch/Universal Packages, although I haven't had the chance of trying them myself yet.

takluyver commented 7 years ago

Now that Travis OSX builds are waiting in the queue for many hours to complete a build, I feel like reminding people that wheel2conda can convert a pure-Python wheel to a set of conda packages on a single system in a few seconds.

It's at a prototype stage at the moment, and it would need more work to turn it into a complete solution, but if we're routinely going to be waiting hours for an OSX buildbot, building OSX packages from Linux seems rather attractive. And if we could do this for all the pure Python packages, it could free up conda-forge's ration of OSX buildbots to build more complex packages.

jakirkham commented 7 years ago

Regardless of how this problem is approached, having some way of getting info about an update to a package from PyPI is going to be very important. I raised an issue with PyPA a few months back about getting notifications for package updates. Please chime in if you have thoughts on how this might be done or feel free to simply show support. Issue link is below.

xref: https://github.com/pypa/warehouse/issues/1683

BrenBarn commented 4 years ago

Has there been any movement on this? It's still a bit of a pain to deal with installing packages that exist on PyPI but don't have a corresponding conda package.

The problem with something like wheel2conda is that I have to download a wheel. I don't want to download a wheel, just like I don't want to download a wheel with pip or conda. I just want to say the name of the package and get it installed, whether it's from conda or PyPI. If conda needs to get a wheel and convert that to a conda package behind the scenes, fine, but the purpose of this issue is to get a solution that's transparent to the user.

scopatz commented 4 years ago

Hey @BrenBarn! (Good to hear from you, BTW!)

I don't think that what you want exists as a tool yet. Mostly because no one has worked on it. Part of the deal with conda-forge is that we are a curated set of packages with a certain communally governed quality to them. This is pretty different from the PyPI model that lets anyone push any package with out even modest checks to ensure quality (ie "Does this package even install?").

Recently @marcelotrevisani developed Grayskull (https://conda-forge.org/blog/posts/2020-03-05-grayskull/), which helps convert PyPI packages to recipes. Lots of folks have started to use it to submit to staged recipes. This could be used as the basis for a tool that installs from either conda (if available) or builds a conda package from the PyPI version and then installs it.

ChrisBarker-NOAA commented 4 years ago

On Fri, Aug 21, 2020 at 2:20 PM Anthony Scopatz notifications@github.com wrote

I don't think that what you want exists as a tool yet. Mostly because no one has worked on it. Part of the deal with conda-forge is that we are a curated set of packages with a certain communally governed quality to them. This is pretty different from the PyPI model that lets anyone push any package with out even modest checks to ensure quality (ie "Does this package even install?").

This indeed, is a core difference. And why we don't want to just auto-populate conda-forge with packages from pypi.

Recently @marcelotrevisani https://github.com/marcelotrevisani developed Grayskull ( https://conda-forge.org/blog/posts/2020-03-05-grayskull/), which helps convert PyPI packages to recipes. Lots of folks have started to use it to submit to staged recipes.

Indeed, as it gets easier and easier to make conda-forge recipes, the number of PyPi packages that aren't supported gets smaller and smaller.

But another difference between conda-forge and pypi is that conda is not only about Python packages -- so you may have the same paaakge name for a PyPi package, or a R package, or a C lib, or what have you.

So what I think would be really useful is a "dynamic" channel: call it something like "conda-pypi" -- when it was searched for a package, it would reach out to PyPi, and try to find it, and if it did it would auto-build a conda package out of it and deliver that. And then cack it for the next request.

Now that I think about it, that may not be possible, 'cause conda expects a channel to have a pre-built list of available packages. But it could populate that list from PyPi, and still only build the package on demand -- and, when there was a failure, keep track and not try again (until the package was updated on PyPi anyway).

But someone would need to build this nifty system, and given the advantages of curation, maybe putting a recipe on conda-forge is a better vet anyway.

note that while there are thousands (hundreds of thousands!) of packages on PyPi that aren't on conda-forge, most of them are not really useful -- unfortunately, PyPi kin dof encourages people to put any old prototype or may-be-useful-someday pacakge up there, and there are a LOT of those!

-CHB

This could be used as the basis for a tool that installs from either conda (if available) or builds a conda package from the PyPI version and then installs it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/conda-forge/conda-forge.github.io/issues/28#issuecomment-678517964, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG7YYE4YFL2TUQ5NJIYMG3SB3QLFANCNFSM4B3LF3XA .

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

conda-forge / conda-forge.github.io