Closed ocefpaf closed 8 years ago
Maybe register a new channel: "temporary-fixes"?
Maybe @mcg1969 has some idea how this could be handled?
I'm not sure it's worth a different channel.
But I wonder if we should give the package a different name, otherwise things can get pretty tangled up:
gdal-cf or ????
(cf for conda forge...)
That means that anyone using it has to change the dependency, but are there going to be any packages outside of conda-forge that depend on a conda-forge special build???
But I wonder if we should give the package a different name
:-1: that can be confusing. We can maybe "sign" the packages with a build string like
build:
string: conda-forge
But IMO just having the package in a different channel should be enough to disambiguate.
The reason for a different channel is IMO that I suspect that at some point users will add conda-forge
to their default channels and a different channel than conda-forge
means that users can get the updated/fixed version with conda install -c whatever matplotlib
but not with a simple conda update --all
.
I'm not sure the channel disambiguates -- does conda prioritize default or your other channels?
But I wonder where all this goes -- with PyPi, it's up to each package maintainer to keep things up to date. with anaconda, it's up to third parties -- mainly continuum. so if they don't, then what?
I'm hoping continuum will adopt a more community model, where folks can easily provide PRs -- it seems it would only save them work. So we'll see.
In the meantime, conda-forge may become the community channel, and I"d say if you want the latest and greatest, then you add that channel -- and you'll get the new MPL, or whatever, if conda-forge provides a newer one than default.
We will probably want to clean things out as continuum catches up.
I'm still wondering about the naming though:
continuum builds package-1.2.1
a new version comes out, and folks want it -- but continuum is being slow on teh draw.
conda-forge provides package-1.3.1
all is good.
now continuum catches up and builds a package-1.3.1 -- now there are two, with the same version. And maybe they are even incompatible in some way. This could make a mess for our users.
If we go with Jan's approach, then this would be cleaner, but users would have to explicilty make a point of getting a newer version -- I think that would be awkward and often missed as an option.
debian has a similar problem when using backports, but could manage that with a special version suffix (~) which sorts after the version without the suffix:
https://www.debian.org/doc/debian-policy/footnotes.html#f37
One common use of ~ is for upstream pre-releases. For example, 1.0~beta1~svn1245 sorts earlier than 1.0~beta1, which sorts earlier than 1.0
As this probably requires changes in conda, I would vote for 1.3.0_real_1.3.1
which sorts before 1.3.1
, but after 1.3.0
.
Hah, it seems conda already implements such a scheme per default, so 1.3.1.cf
is lower than 1.3.1
https://github.com/conda/conda/blob/2ba04a6b2617227de578f4af54ff11115f97ca5c/conda/version.py#L81
nice!, maybe we can use that, then.
-CHB
I'm not sure the channel disambiguates -- does conda prioritize default or your other channels?
Disambiguate? Yes! Solve the install/update problem? No. I even saw an e-mail from Travis Oliphant today discouraging the use of conda update --all
because of how conda solves dependencies.
I'm hoping continuum will adopt a more community model, where folks can easily provide PRs -- it seems it would only save them work. So we'll see.
:+1:
discouraging the use of conda update --all because of how conda solves dependencies
This is probably https://github.com/conda/conda/issues/1967
Yes. The conda solver has recently been overhauled, and my experience with it is that the performance has been dramatically improved.
If two channels are configured to be used by conda, and both provide the same package name, version and build number, then conda chooses the package from the channel defined first in the config.
now there are two, with the same version. And maybe they are even incompatible in some way.
This is the real problem here. We have, in the past, fixed packages (on IOOS and SciTools) which Continuum package, often by releasing a newer version. The problem comes when Continuum update the version of the software they package, but don't actually fix the problem. This has happened on several occasions with packages such as Shapely and pyproj. From a user's perspective, they are just updating their software and it goes from a functional state to a non-functional state - not really ideal. Because of the lack of a repository of canonical recipe source, all we have been able to do is report a problem with the package, not actually fix it (i.e. in the form of a PR).
@jakirkham, @tacaswell and @stefanv have all expressed an opinion on the subject of this issue in the past. Do any of you have comments on when it should be the place of conda-forge to package software which is already being packaged by Continuum?
@JanSchulz thanks for pinging me. There's not a whole lot I can officially say yet, except that we recognize the need to support alternatives to Continuum's default
channels. We're actively working on a particular community channel solution, but it is not the only way forward, and it shouldn't be. We've been watching the Conda Forge project with enthusiasm. As we talk more we may be able to come up with some specific ways Continuum can help with it. But having an effort like this that Continuum does not control is beneficial to the Python community at large, so I'm grateful you're working on it.
I see three different problems with the default channel:
All three of these issues are an inevitable consequence of Continuum's finite resources for building and supporting packages. We certainly acknowledge that this isn't going to satisfy people who regularly bump up against one of these three problems. Heck I bump up against all three of these problems myself.
My particular perspective, as many of you know who have been watching my conda fixes recently, is on the dependency solver. I've been spending time overhauling it, and it's certainly going to fix some of the issues like conda update --all
being slow, conda remove
potentially breaking installs, etc. I'm glad @pelson is confirming that my improvements are beginning to make a difference.
But honestly, the mathematics of the solver isn't really the issue here, at least not directly. What you are discussing in this thread is basically the challenge of channel clashes. That is: how should conda handle things when two or more channels release versions of the same package? At the moment, conda effectively "merges" the channels together, so that the packages interleave with each other purely based on version and build numbers. That's clearly not a workable solution. For one thing, build numbers don't have meaning across channels; so for instance, build 1 from channel A may actually be newer than build 2 from channel B, and conda doesn't know that. This is something we need to decide on a fix for.
What we need, it seems to me, is to identify specific improvements to conda that would greatly improve the ability to use alternate, community-driven package channels. For instance:
1) A fix for conda that untangles packages/channel conflicts. For instance, we could say that the highest-priority channel is always preferred for a package, and any packages by the same name in lower-prioirity channels are ignored. But I could see a variety of other strategies, and perhaps conda should adopt several, choose on as a default, but make the others available by configuration.
2) An enhancement to conda that allows channel preferences to be adopted on a per-package and or per-environment basis. For example, perhaps I add the conda-forge
channel as a lower-priority channel, but I actually prefer one of its packages to the one provided in default
. There should be a way to specify that priority preference and persist it across updates and later installations.
This is the kind of thinking that would be very helpful for me personally. We really do want conda
to be adopted more widely---heck, we'd be pleased if someone built their own Python distribution that used conda
as a packaging model. And we'd like to find ways to enable groups like Conda Forge to flourish without having to wait on us. I actually do think that there are some changes to conda
that we can push through in the short term that will greatly improve our ability to work in parallel.
Having a way to specify channel preference globally or per env would be a really good addition to conda.
:+1:
I would really rather we tackle the channel collision problem correctly than to utilize weird version numbers or (even worse) track_features to disambiguate.
If we could get Continuum to open recipes (or adopt community recipes upon submission, and then open those up), then perhaps much of the problem can be avoided? Ideally, we do not want multiple versions of numpy with the same version tag floating around.
An alternative path is to build everything you need into your own channel.
For mixed channels, I don't see a straightforward way of resolving what to install without additional meta-data. In Debian there is the concept of "pinning", which allows you to fix certain packages in place.
@mcg1969: absolutely -- but we need help from conda itself to do it "right" -- are you speaking for continuum, hard to tell :-)
@ocefpaf wrote: "Having a way to specify channel preference globally or per env would be a really good addition to conda."
I think that would actually be a simple solution that would mostly solve the problem at hand -- folks could put the IOS channel, or conda-forge channel at their first preference, and then they would get the latest and greatest.
Granted, the default channel may get updated in a way that leapfrogs conda-forge, but I think it will be up to whoever is maintaining the conda-forge package to keep an eye on that.
And the default channel is clearly the upstream one -- conda-forge will be following its lead, so that could work.
I've been thinking a bit about how this works with PyPi (and PyPi does work well, for the things i works well for, i.e. pure-python packages)
It is a totally different model -- PyPa only provides the infrastructure -- each and every package is maintained by individual package maintainers. Ideally, conda packaging could go that way, but it's going to be a long time (or never) before package authors in general support conda. (never mind non-python stuff....)
By my idea, at least, is that conda-forge becomes the PyPi-like place for conda packages -- it will start (has started) with groups of packages that are not in the default channel being maintained by a third party, but hopefully individual package authors will start to maintain their own packages. So we need to design the infrastructure to support that.
In fact, as a package author steps up to maintain a package, maybe it could even be removed from the default channel. In the long run, maybe continuum will need to maintain few packages, and rather, have Anaconda be the "curated" selection, but much of it would be pulled in from the authors' builds (OK, maybe that's a fantasy).
Anyway, what all this means is that it should be very easy for a package author to push builds to conda-forge, like it is now with PyPi being integrated into the PyPa stack (distutils, setuptools, pip, I"ve lost track...)
One easy idea would be to add somethign like this:
conda install -c conda-forge --pin-channel matplotlib
That would add an entry to the config file that matplotlib should be taken from the conda-forge and all other packages with the same name form other channels should be discarded (e.g. simple add a step-0 to the solver which removes all matplotlib packages from other channels from the list of available packages).
This will help with the problem of "fixing" packages in the default channel (and IMO this should be the only part where conda-forge should package packages in the default channel).
Another step would be to configure the "default" channel, so that conda does not see the anaconda/Continuum packages at all. Not sure if that is possible today?
@stefanv: Absolutely!
If the default channel was built from (mostly) recipes maintained in a public gitHub project(s), it would be monstrously easier to keep everything up to date and in-sync. We could/would do a lot of the work for continuum.
And they could start one package at a time (shapely?).. it wouldn't have to be a wholesale, all at once move.
I can imagine it's inertia more than anything else that's prevented this from happening so far, but it's a bit frustrating from outside.
-CHB
If we could get Continuum to open recipes (or adopt community recipes upon submission, and then open those up), then perhaps much of the problem can be avoided?
I think it would already be enough if Continuum would add all their package recipes (as they are currently used -> the matplotlib recipe in the conda-recipes repo is out of date) and accept PRs for already included packages. I would be happy to add patches there if I know that they land on my HD a day after they are merged... Contiuum still would have the final say, it would speed up the updates on new upstream releases, and Continuum would have less work... (cc: @mcg1969 :-) )
@mcg1969 articulated many of my thoughts more coherently than I would have, I think channel-level precedence is the probably the right way to fix this, but I would like a way to control (maybe at the package level) if it goes with newest-possible or prefers a specific channel.
I was also thinking of the debian idea of 'pinning' as a model for how to do this.
For day-job we have been making aggressive use of 'postN' versioning (pulled directly from git describe via versioneer) which helps the case where the issue is fixes from up-stream project is adding/fixing things. Although, this can get funny if you are packing commits from side-branches and definitely does not help if the difference is different sets of locally applied patches or build configuration.
Pinning channel might be ok as long as it is per environment.
@mcg1969 articulated many of my thoughts more coherently than I would have, I think channel-level precedence is the probably the right way to fix this, but I would like a way to control (maybe at the package level) if it goes with newest-possible or prefers a specific channel.
@tacaswell my Linux distro (OpenSUSE) does exactly that. I can set repository preferences that will be used when updating the system, but I can also do a "distribution upgrade" that will get the newest-possible from all repositories. This operations issues warnings stating that the user is responsible for the system stability when adding third party repository and performing the distribution upgrade. Conda is dangerously confusing with that! I can see tons of users breaking their system using conda-forge packages and going to Continuum mailing list to complain.
With that said. This behavior might be a long-term goal. Right now I believe that a global channel preference is already a big win.
Pinning channel might be ok as long as it is per environment.
Agreed.
For an idea on how to mark package/channel combinations as good/bad: https://github.com/conda/conda/issues/2067
We really do want conda to be adopted more widely---heck, we'd be pleased if someone built their own Python distribution that used conda as a packaging model.
That sounds like a challenge! Accepted, acpd, Another Conda-base Python Distribution.
On that topic and related to packaging software already in the default conda channel, would it be possible for someone from Continuum to clarify the license of the recipes in the ContinuumIO/anaconda-recipes repository? A number of those are prime candidates for use in conda-forge.
Nice! Do you have any thoughts on their integration into conda-forge, @jjhelmus?
Long term I think they could be integrated into conda-forge, but first some logistics need to be worked out. Having a conda-forge version of conda clobber the Continuum version would not be good.
Maybe they could be placed under a special label that is different from main
so they could be opted into instead of installed by default when adding conda-forge.
I could be wrong but I do not think having a non main
label is taken into account when doing a conda install from Anaconda.org. Having a separate channel for these packages might be a possibility but it seem liked the consensus in this issue was that this was not an ideal situation.
Anaconda-recipes is BSD, same as Conda-recipes. Sorry that wasn't posted. I have added it. It is our intent to move everything in anaconda-recipes to the conda-forge-feeding-community-channel plan. If anyone would like to help in that effort, we'd appreciate it. The gist of that plan is:
Great, thanks for the clarification @msarahan
Thanks @msarahan.
It is our intent to move everything in anaconda-recipes to the conda-forge-feeding-community-channel plan.
That is very encouraging! I think this should help in alleviating the strain of package maintenance on many fronts moving forward. Will the rest of the currently close sourced recipes be open sourced at some point in order to aid in this movement?
We move recipes from the internal repo, anaconda-recipes, and conda-recipes to conda-forge. Those other places are either shut down or replaced with links or git submodules (like the "feedstocks" repo in conda-forge)
Either of these sounds like a good plan. I suppose submodules are appropriate as part of the transition given this will take some time.
We mirror or link packages built by conda-forge on our "community" anaconda.org channel. There may be other sources of packages there, also.
At what point do you envision this happening? Should it be delayed until most of the base anaconda recipes packages are added?
We run a validation process on packages (inspect recipe, run test suites, verify package contents against a build of the recipe). Once verified, these packages will become the content on the default channel.
Will this verification process be done in the open? For instance, will the scripts for this verification be placed in a public repo? I imagine that it will be nice to include these checks in the process of determining whether a package gets added on this side, as well.
Will the rest of the currently close sourced recipes be open sourced at some point in order to aid in this movement?
Yes, but not as a huge dump. We have to convert many of these recipes from the older format, so it'll be on a case-by-case basis, or as we have time. Generally, the policy I'm following is that if I'm doing anything to fix or update a recipe, it gets translated and transferred (case in point: psycopg2 and dependencies). Not everyone is following this policy, though.
I suppose submodules are appropriate as part of the transition given this will take some time.
Submodules also have some precedence from conda-forge, which is nice.
We mirror or link packages built by conda-forge on our "community" anaconda.org channel. There may be other sources of packages there, also. At what point do you envision this happening? Should it be delayed until most of the base anaconda recipes packages are added?
The community channel mirroring can and should be happening now. I don't see any reason why it shouldn't (aside from perhaps channel priority confusion). We have mirrored some channels in the past, but we need to get the infrastructure to implement this properly. Right now it is a cron job driving some anaconda.org API stuff that @mcg1969 wrote. The validation and use of community-built packages in the default channel is what will take time. Completely arbitrary time estimate until community packages are replacing some continuum internal builds: 1-2 months?
Will this verification process be done in the open? For instance, will the scripts for this verification be placed in a public repo? I imagine that it will be nice to include these checks in the process of determining whether a package gets added on this side, as well.
Great idea. There is no security in obscurity, and I would appreciate all of your support in developing this process / tool. I will come up with a repo for it tomorrow after some discussion internally on where it fits.
On Once verified, these packages will become the content on the default channel.
This sounds like the end game is to have the default channel host all continuum packages, as well as all community-maintained packages.
So will there be any place for a "community" or "cons-forge" channel at all?
On the one hand -- great for the user community. On the other hand, I have a hard-to-define impression that there will still be a need for "something" in between the default channel and a random scattering of channels on Anaconda.org.
I.e. Multiple levels of "trust":
Anaconda: tested by continuum and all known to work together.
Default: tested by continuum, with the lastest and greatest.
Community: curated by a trusted community, maybe experimental builds, release candidates, etc.
Arbitrary Anaconda.org: Buyer beware !
-CHB
Your hierarchy sounds pretty reasonable, and very much in line with what I have in mind. The default channel is less "hosting all continuum packages" and more "hosting continuum-verified community-maintained packages." Continuum is participating in the maintenance as well, not just pawning it off. However, it is a reduction of Continuum's role as "authoritative builder" to "tester/verifier/integrator of packages built by a standard community-accessible system." The default channel still involves human verification on our end, and will trail the community channel to some extent. More importantly, the community channel's critical place is an aggregator, where a single central channel combines authoritative packages from multiple other channels. This hopefully will help all kinds of package conflict and channel priority issues.
I see arbitrary anaconda.org less as "buyer beware" and more as "YMMV." If small channels want to play ball with standards and all that, they'll be very welcome in the community channel. A very easy way to do that would be to just contribute packages to conda-forge.
Sounds great -- looking forward to it! TIme to get some of my stuff in conda-forge!
I see arbitrary anaconda.org less as "buyer beware" and more as "YMMV."
well, sure -- the point was that if you use an arbitrary anaconda.org channel, it is up to you to confirm who built it, whether it suites your needs, and whether it is safe.
I haven't heard of any obuses, but one certainly COULD put all kinds of dangerous software up on an anaconda.org channel -- makes me nervous!
-CHB
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
Yes, but not as a huge dump. We have to convert many of these recipes from the older format, so it'll be on a case-by-case basis, or as we have time. Generally, the policy I'm following is that if I'm doing anything to fix or update a recipe, it gets translated and transferred (case in point: psycopg2 and dependencies). Not everyone is following this policy, though.
Sure, makes sense. Hopefully others will follow.
Submodules also have some precedence from conda-forge, which is nice.
The feedstocks repo is pretty handy for this, as well.
The community channel mirroring can and should be happening now. I don't see any reason why it shouldn't (aside from perhaps channel priority confusion). We have mirrored some channels in the past, but we need to get the infrastructure to implement this properly. Right now it is a cron job driving some anaconda.org API stuff that @mcg1969 wrote. The validation and use of community-built packages in the default channel is what will take time. Completely arbitrary time estimate until community packages are replacing some continuum internal builds: 1-2 months?
This all sounds reasonable. Will there eventually be a more proper form of mirroring with Anaconda channels? It seems like that can be a pretty useful feature to get some select packages mirrored into one channel as opposed to having several channels. Also, could be helpful for reducing collisions.
Great idea. There is no security in obscurity, and I would appreciate all of your support in developing this process / tool. I will come up with a repo for it tomorrow after some discussion internally on where it fits.
Sounds good. Feel free to ping me when it is up. Would be nice to get a feeling as to the requirements for validation to start.
cc @patricksnape
Thanks for cc'ing me @jakirkham. This sounds amazing. It would be really really good to have a sort of staggered release schedule whereby the community thinks things are good to ship onto conda-forge and then continuum eventually pulls them onto mainline once verified. This will be great for much smaller packages too.
If we decide on the protocol for all of this I'd be really happy to evangelise it by writing some blog posts/documentation about how package maintainers can easily opt into conda in a similar manner as they do to PyPi but via conda-forge!
I suppose I should start submitting some of the recipes I have like opencv that will likely be widely useful! I need to get a feel for how conda-forge works yet but I'd be really happy to move away from hosting my own packages for other big projects if possible.
Not sure if this is up your alley, @hajs, but I figured you might be interested in this sort of system and with your wide breadth of recipes we would certainly appreciate your feedback going forward and any contributions you would be willing to me.
Just a quick comment: it might be worth studying the way the Fedora/rhel/centos ecosystem works in detail for inspiration, since it sounds like you might be moving towards reinventing large parts of it :-)
Glad to see you over here, @njsmith.
it might be worth studying the way the Fedora/rhel/centos ecosystem works in detail for inspiration
Great point. This isn't true in all cases, but many conda-recipes borrowed their build strategies from Linux distros. For instance, the fftw
recipe was inspired by the Arch Linux build. The gcc
recipe was inspired by various Linux distros and Linuxbrew. Though it certainly doesn't hurt to review our premises as we go through this. Also, it is worth checking if we are really doing the best thing compared to other package managers.
...since it sounds like you might be moving towards reinventing large parts of it :-)
What can we say? We are unreasonable people. We want a package manager that is cross platform, doesn't require sudo
, and let's Python be awesome. :smile:
We want a package manager that is cross platform, doesn't require sudo, and let's Python be awesome.
I am stealing that phrase next time I have to prepare a presentation on conda/conda-forge :wink:
Thanks, @njsmith - I would do well to study that, since I have been proposing much of this process. It makes perfect sense now that you mention it, but I've had my head stuck in Windows and Ubuntu sand for too long to be aware of it. I'll go study.
@jakirkham and @ocefpaf, I believe his comment is aimed at the process of community-developed packages feeding into an enterprise system (which then may spawn a community-led enterprise system), more than he is talking about build strategies of any given recipe.
@sstirlin, given your views on packaging and valuing of conda
not to mention your extensive recipe collection, we would be really interested in working with you to help get this all integrated into conda-forge. It will definitely help us and should help you reduce your maintenance burden. Please let us know how we can help.
process of community-developed packages feeding into an enterprise system (which then may spawn a community-led enterprise system), more than he is talking about build strategies of any given recipe.
Exactly -- the social technology, not the code technology :-).
Exactly -- the social technology, not the code technology :-).
Ok, thanks for clarifying.
I believe his comment is aimed at the process of community-developed packages feeding into an enterprise system...
This much I follow.
...(which then may spawn a community-led enterprise system)...
Could you elaborate on this point a bit more, @msarahan and/or @njsmith?
...more than he is talking about build strategies of any given recipe.
This is clear.
centos:fedora:RHEL :: conda-forge:anaconda:anaconda-enterprise
(@JanSchulz brought this up in https://github.com/conda-forge/conda-forge.github.io/issues/16#issuecomment-182430891)
I Agreed with @JanSchulz that we should avoid as much as possible to add packages in conda-forge that are available in the default channel.
However, we already have a few redundant packages (
pyproj
,shapely
,geos
, and more to come soon). The reason for th1 redundancy is that those packages are partially broken in the default channel.(And we could not find a proper channel of communication to send the recipe patch back to them.)Maybe, when fixing a default channel package we should allow the package addition here as long as there is a plan to send that fix back to the default channel, and to remove the package from conda-forge once that happens.