chef / supermarket

Chef's community platform
https://supermarket.chef.io/
Apache License 2.0
214 stars 113 forks source link

No user interface to remove old cookbooks. #954

Open prometheanfire opened 9 years ago

prometheanfire commented 9 years ago

would be nice to be able to have a checkbox or something on each entry of the drop-down containing the list of versions of the cookbook along with the ability you delete selected versions.

Apparently this is supposed to work from command line.

knife supermarket unshare "stack_commons/versions/0.0.1"

Also, this might be a stretch, but having a max number of cookbook versions be something we set could be good. This could help with berkshelf's dep resoultion problems (seems to do an exhaustive search). Might help with https://github.com/berkshelf/berkshelf/issues/1356

martinb3 commented 9 years ago

Something to limit the impact to Berkshelf of having many, many old releases all still available in Supermarket would be awesome. :+1:

sean-horn commented 9 years ago

Also covered in https://github.com/chef/chef/issues/3334

lamont-granquist commented 9 years ago

Copied from chef/chef#3334:

This probably has the same problem that yanking gems has. If someone has their environment locked in Berksfile.lock and regardless of how you feel about it the cookbook is working for them, and you then yank it, then you potentially break their production deployments (suddenly, at 3am on a friday night their local time, etc).

You might mark a cookbook so that it will not depsolve and be offered in the universe but it must still download to not break existing Berksfile.locks.

martinb3 commented 9 years ago

You might mark a cookbook so that it will not depsolve and be offered in the universe but it must still download to not break existing Berksfile.locks.

This would be great as an alternative.

lamont-granquist commented 9 years ago

So what is the "impact to berkshelf"?

Why do we need to fix this?

Just "having lots of old versions" is a bad reason to solve this because you can never fully get rid of them because of the yanking problem. If its largely a cosmetic issue then this issue should not get fixed because its going to be a yak with little practical payoff (or possibly disastrous payoff if it gets implemented incorrectly).

If there's perf issues, there may be some other solution like having the server had down a pruned subset of the universe.

martinb3 commented 9 years ago

I think there are two issues at play here.

Regarding the impact to berkshelf, I've seen a number of circumstances where removing a cookbook from Supermarket and reuploading only the latest version (not my preferred solution, but I've definitely seen people do it) actually causes Berkshelf to successfully do dependency resolution, whereas before the change, it would just eventually time out and fail to ever resolve dependencies. Making additional versions available to Berkshelf appears to cause the search space for possible solutions to grow by orders of magnitude.

I also think we may still want to yank a cookbook version under specific circumstances. Imagine a scenario where the openssh cookbook accidentally opens SSH for the root user with an empty string for the password. You know that no user will ever actually want that, and it's probably an issue where it's better to cause dependency resolution to fail than to supply an outdated version. I could see this even for something as simple as a cookbook that accidentally leaves in a fail statement. Sometimes, there are no reasonable circumstances where someone should consider a particular cookbook version in their dependency resolutions.

In almost every scenario (Friday, 3am, etc), they will still have a version of the cookbook on the chef server. We aren't depriving someone of the use of an extant cookbook simply by making it no longer listed in the Berkshelf endpoint.

having the server had down a pruned subset of the universe

That would be a great solution. I want to limit the old versions available for depsolving, not necessarily make old versions unavailable in environments that are already version locked.

Perhaps there's even something that can be done when feeding the constraints to gecode that would fix this particular timeout issue. Berkshelf issues (either timing out or simply failing outright) have been the major drivers I've seen for people wanting to remove old cookbooks.

lamont-granquist commented 9 years ago

So your first perf problem is likely more involved than just having additional cookbook versions available. Where I've seen that before is when the cookbook has a dep on itself and every version of itself satisfies its own circular dependency. That could be addressed by making it illegal to upload a cookbook that depended on itself or stripping the self-dependency on the fly. I should actually commit a foodcritic test at least to start removing that problem.

You are right that if a truly pathological cookbook is uploaded that there's a case for really yanking it. That should be reserved for truly stupidly dangerous code that code published or for secret key material that got accidentally published though. The problem is that if people don't understand the implications of yanking and then yank cookbook versions it will cause chaos.

And I think you don't appreciate the impact if someone has a continuous delivery system or autoscaling in which case their depsolving may change unexpectedly or their Berkshelf.lock files (or Policyfile.locks in the future) may not be able to resolve. People who are building pipelines should be able to pick up a lock file that worked yesterday and apply it to production today without having the cookbook disappear and breaking their pipeline (in this case it doesn't necessarily break prod, but it breaks the ability to promote to prod which could bring deployments to a halt).

I'd like to see the issues that were causing Berkshelf (and modern berks 3.x berkshelf with gecode, not the 2.x ruby depsolver) to timeout on depsolving and see what kind of pathological graphs were there. I strongly suspect that there's more going on than a lot of old versions (and pruning the old versions may fix the issue, but that doesn't mean that old versions were to blame for the root cause).

lamont-granquist commented 9 years ago

And actually let me state that clearer:

As long as people think that yanking cookbooks is the solution to long berkshelf depsolving timeouts, without any proof that it really is so, then implementing yanking is the LAST thing that we should do because that will guarantee that people who do not understand the implications of yanking cookbooks will start yanking them a lot, which will cause chaos.

So, a BIG :-1: on implementing this until we actually dig into the depsolving issues you mentioned and understand them.

lamont-granquist commented 9 years ago

Never open up a footgun manufacturing shop in the land of the people with the giant feet.

martinb3 commented 9 years ago

Well, the berkshelf issue aside, I'd argue there's still a case for yanking, for the pathological case.

And I think you don't appreciate the impact if someone has a continuous delivery system or autoscaling in which case their depsolving may change unexpectedly or their Berkshelf.lock files (or Policyfile.locks in the future) may not be able to resolve.

I feel that any depsolving or re-downloading cookbooks during autoscaling or CD is a really dangerous thing to do, in general. I do autoscaling with chef today, and I don't re-run any parts of Berkshelf just to add capacity or deliver software. It's just too risky. It's a constant battle today to keep it flowing (gem yanking breaks it, as you mention).

I've designed my production workflows not to depend on Supermarket's availability/reachability (ditto for rubygems.org). I reserve depsolving and re-uploading cookbooks for CI, and everything after CI is just an artifact that does not change.

It feels like we should recommend that people download cookbooks, and lock both the versions and the actual cookbook artifacts, at CI time, if they are concerned about stability (which you have to be in CD or Autoscaling). Yanking should not break that workflow.

lamont-granquist commented 9 years ago

That's a good and paranoid way to design it, but I guarantee that people won't do that, and yanking cookbooks will lead to breakage in locked versions of cookbooks disappearing.

lamont-granquist commented 9 years ago

added acrmp/foodcritic#328 to start addressing auto-dependency problems

nellshamrell commented 9 years ago

Reviving this discussion:

What is being requested: Many users have requested the ability to "yank" a certain version of a Chef cookbook from the public Supermarket.

Potential Problems: As soon as a cookbook version is uploaded to Supermarket, there is the potential that someone will have downloaded and used said cookbook version. If someone were to yank a version of a cookbook because they accidentally uploaded security credentials (people have accidentally uploaded security credentials in cookbooks they uploaded to Supermarket multiple times this year), it could create a false sense of security. Additionally, if someone is depending on the version of the cookbook, yanking it could break their stuff.

Potential Solution: 1) We code a feature into knife cookbook and knife supermarket (if the pull request currently in knife supermarket is insufficient) that allows a user to yank a certain version of a cookbook from Supermarket 2) We include a stern warning that someone may have already downloaded and used the cookbook and if they are yanking it because they accidentally uploaded security credentials, they need to invalidate those credentials STAT 3) Once someone has uploaded a version of the cookbook - even if they yank it, they can never use that version number again. This will prevent confusion over having different versions of the cookbook (the yanked one and the revised one) potentially sharing the same version number and causing problems.

I am absolutely open to discussion on this, but would like to decide yes or no within the next two weeks so we can either move forward or shelve this with clear reasons why.

martinb3 commented 9 years ago

Hi @nellshamrell!

if the pull request currently in knife supermarket is insufficient

Do you have a link to that PR?

Potential Solution: ...

:+1: I think your items 1, 2, and 3 applied together seem reasonable. As I mentioned in a previous comment, yanking versions would be useful for a number of reasons, even if dangerous to do without understanding the implications. I'd like to allow/enable the feature, and then make strong recommendations about how to protect yourself from mis-use.

As well, your proposal would stop people from deleting the whole cookbook and then re-uploading only the latest version, which I see happening not-infrequently today. We had someone on IRC in #chef the other day who actually did this, and then lost the cookbook name when someone else claimed it in between his delete and re-upload (we should remove the ability to delete a cookbook, or have both delete and yank, so that users aren't having to choose something worse, and have, IMHO, a very not-delightful experience).

lamont-granquist commented 9 years ago

Just make certain that the warning for item 2 is exceptionally clear.

I got burned very badly by yanking an ohai gem version awhile back. I got a warning about it being bad practice but I thought to myself "Yes, but I'm a professional software developer, and I have $REASONS" and so I went ahead. The chaos that ensued as every Gemfile.lock that had it pinned exploded caught me completely off guard as to how bad it was.

The warning should clearly state that it will break anyone's production deployments who has that version in a Berksfile.lock, pinned in an environment file, or used in a compiled Policyfile.lock.json, and that cleaning up old versions on the public supermarket is highly discouraged.

prometheanfire commented 9 years ago

The reason why being able to yank things is needed is that the resolver does an exhaustive depgraph to figure out what to pull in, if you have too many versions it simply times out...

https://github.com/berkshelf/berkshelf/issues/1356

martinb3 commented 9 years ago

The warning should clearly state that it will break anyone's production deployments who has that version in a Berksfile.lock, pinned in an environment file, or used in a compiled Policyfile.lock.json, and that cleaning up old versions on the public supermarket is highly discouraged.

I don't think this is strictly true, and I think the statement needs to be as accurate as possible. I would change 'will break' to 'very likely to break' and I would also add something about "when using public Berkshelf endpoints for dependency solving and downloading."

Or I'd say "will eventually break" if you don't want to have the context of Berkshelf caches, public Berkshelf endpoints, etc.

lamont-granquist commented 9 years ago

That IS NOT a reason to yank supermarket versions, that is a reason to fix the dep-solver:

https://github.com/chef/dep-selector/pull/34

@danielsdeleo asked people to test that patch in the berkshelf thread and then got crickets back and it dropped off his radar (plus Policyfiles are using a different depsolver and gecode is going to eventually go away, so that is ultimately going to be time wasted to fix how we're using gecode).

Fixing depsolver bugs via pulling publicly released versions of an artifact when you have zero visibility into if someone is using them or not is an extremely bad approach and why I'm starting to lean back to :-1:'ing this feature again.

lamont-granquist commented 9 years ago

Yeah, okay so it won't break your production converges when you have your own copy of the cookbook uploaded to the chef-server. But if you ever want to run test-kitchen against pinned environments then it'll break. As a user running the command you have no visibility into if people are pulling equality pinned versions of that cookbook down, it is a VERY anti-social command to run.

We also have actually no way to audit this or provide any information either since berks just pulls down /universe and we never see the dep constraints.

prometheanfire commented 9 years ago

wasn't stating that the dep solver shouldn't be fixed

nellshamrell commented 9 years ago

Here is the pull request to knife supermarket that I mentioned: https://github.com/chef/knife-supermarket/pull/16

nellshamrell commented 9 years ago

I'm starting to realize we're talking about a few different issues in this thread: 1) Should someone be able to yank a specific version of a cookbook?

prometheanfire commented 9 years ago

ya, sorry for the distraction on 2, it was simply the pain point that caused the need for yanking in my experience.

lamont-granquist commented 9 years ago

The problem is that you didn't have a 'need' to yank cookbooks. That is precisely the road that I went down with yanking the ohai gem. I had a very clear 'need' (so i thought) but in retrospect that was entirely the wrong solution.

The fact that you keep on talking about 'needing' to yank cookbook versions to solve the problem you have just continues to reinforce to me how dangerous this feature is.

And yes, yanking entire cookbooks is clearly horrible as well. But arguing we should allow yanking cookbook versions because we can yank entire cookbooks is like arguing the morality of cluster munitions because we already have nuclear weapons...

lamont-granquist commented 9 years ago

Actually @nellshamrell slightly more constructively than arguing about removing cookbook versions...

The rubygems community has talked about a better API than "yank" where it is just omitted from the rubygems depsolving, but can still be fetched if its already solved in a Gemfile.lock

Instead of dropping the row out of the table for the cookbook version and purging it, we could add a flag which only has the effect of omitting it from /universe.

That solution would mitigate all the issues that I have with it. While it would still let authors 'tidy' up the cookbook versions and would even 'solve' the depsolving issues by allowing removing old versions--without causing chaos.

nellshamrell commented 9 years ago

I like that solution :+1:

robbkidd commented 9 years ago

Instead of dropping the row out of the table for the cookbook version and purging it, we could add a flag which only has the effect of omitting it from /universe.

I'm a fan. :+1:

prometheanfire commented 9 years ago

s/yank cookbooks/yank cookbook versions/

I do like the tag solution though.

martinb3 commented 9 years ago

:+1: I had suggested in a previous thread that even just a way to hide it would work for me. I can see that some others still might want to yank a version for other reasons, but just hiding it from /universe works for me. I would hide versions that I know have major bugs as well, to prevent folks from accidentally depsolving to those versions when they don't need to & it's likely to result in a broken outcome.

It would be nice to hide it by default from the UI as well, so that drop down of versions doesn't grow infinitely. Maybe with a "... see all versions" option as the list drop-down choice.

lamont-granquist commented 9 years ago

Yeah, 'hidden [versions]' tab.

And really deleting a cookbook entirely should probably just 'hide' all the versions by default. They really don't take up a lot of space. And cleaning out the 1.x versions after 2.x ships can be done with 'hide' and accomplish most of what the user wants. Doing something like nuking the cookbook entirely and then replacing it where the version numbers might overlap is fairly insane (and REALLY insane for the public supermarket).

We could support some kind of --force / --really-delete-the-data flag, but we should also have the ability to configure supermarket instances to ignore that flag. IMO, the public supermarket instance should refuse to delete and should only hide -- the public supermarket should never delete data (its like force pushing public git branches).

lamont-granquist commented 9 years ago

This just came up again in the rubygems context where celluloid 0.16.1 was yanked from rubygems.

I just very nearly watched someone do a 'bundle update' to update their Gemfile.lock which pulled 0.16.1 from their gemset in their ruby which then changed their Gemfile.lock which then very nearly got uploaded to a project and would have badly broken the build.

In the cookbook/Berkshelf work this will work similarly since the ~/.berskhelf cache will have cookbook versions in it which could have been removed from supermarket, and which will work locally but which will be broken on a clean system which needs to pull from supermarket.

You basically have a massive, distributed cache invalidation problem which goes completely unaddressed when you delete cookbook versions. On the public supermarket, it should likely never be allowed to happen in any form, and cookbooks and cookbook versions should be considered like erlang variables and never deleted -- only marked 'hidden' to prevent depsolving/UI clutter.

nellshamrell commented 8 years ago

We will revisit this issue in 2016.

martinb3 commented 8 years ago

Thanks @nellshamrell! :+1:

lamont-granquist commented 8 years ago

Another good example of why cookbook version deletion should not be allowed, this time coming from the npm world:

http://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/ https://github.com/azer/left-pad/issues/4 https://medium.com/@azerbike/i-ve-just-liberated-my-modules-9045c06be67c#.3q2ycrpt9

The issue around the 'kik' module there is a legit legal problem. There needs to be a mechanism to delete cookbooks for legal reasons, but the supermarket admins can take care of that. The fact that the user then went and nuked all his modules including 'left-pad' is an interesting abuse of module deletion. Regardless of how anyone feels about the action that user took, if you put yourself in the feet of a system admin trying to keep their sites up and their CI/CD process humming smoothly having a random software dev delete critical modules that you depend on would come as a bit of a shock.

The rust/cargo community also defines 'yank' similar to my proposal for 'hide':

A yank does not delete any code. This feature is not intended for deleting accidentally uploaded secrets, for example. If that happens, you must reset those secrets immediately.

The semantics of a yanked version are that no new dependencies can be created against that version, but all existing dependencies continue to work. One of the major goals of crates.io is to act as a permanent archive of crates that does not change over time, and allowing deletion of a version would go against this goal. Essentially a yank means that all projects with a Cargo.lock will not break, while any future Cargo.lock files generated will not list the yanked version.

The rust/cargo developers also handle a legal situation like the 'kik' one via administrative intervention and deliberately do not provide a command:

https://github.com/rust-lang/cargo/issues/1436:

This is currently intentional, the reason being that deleting a release is generally the fist thing you reach for when you accidentally publish and it's rarely what you want to actually do (hence the yank). There's definitely a legitimate reason for deletion, however (such as what you've mentioned), but they're pretty rare. For these cases we can be contacted directly for removal, and any form of legal reasons would be forwarded to Mozilla lawyers currently instead of us handling directly.
lamont-granquist commented 8 years ago

Just wanted to capture the fact that based on skimming that npm-left-pad-chaos thread that for security reasons I think we have a hard, absolute requirement that we MUST NOT implement a user-accessible delete function on supermarket.

There is a risk of denial-of-service from a "bad actor". The people who use Hosted and Supermarket and pay for that service expect to be able to run production infrastructure without it being taken down. While running a private supermarket with an internal cache of cookbooks is obviously the 'Enterprise-class' way of doing this, we still cannot expose paying Hosted customers to the risk that any random supermarket dev with rights to a popular cookbook can yank it.

There's also a security issue in that if cookbooks can go away or be replace or mutate that actual bad actors could hijack cookbooks and inject malicious code into production systems. This is not usually a problem with cookbook since it requires a cookbook both becoming popular (through good acting) and then maliciously injecting code (through bad acting). But if popular cookbooks could be completely removed and their namespaces obtained and new versions uploaded, then actual attackers could cheaply take over popular cookbooks and inject code without having to go through the time consuming process of building trust in the first place. This is a real risk in the case of the left-pad-chaos since it looks like npm is pretty much a wild-west and the user pulled the plug on over 250 modules leaving all kinds of gaping holes that attacks could upload code into to exploit.

The ability for a supermarket admin, with privs, to nuke cookbooks and cookbook versions via the UI is probably a useful feature, that would allow private supermarket admins to delete versions -- but exposing that feature to users I don't think can ever be allowed to happen.

martinb3 commented 8 years ago

I feel like there's a subtle distinction between removing a cookbook from Supermarket and having it removed from the Chef server that your nodes depend on. The two comments above make them sound like the same thing, but actually, removing from Supermarket does not take the code away from any Chef server that already has it. I feel like we should be sure to make that explicit in the debate, since production infrastructure depends on the cookbook artifacts being on Chef server, not on Supermarket. Unless I've missed something obvious here about the architectures people are using? (entirely possible)

As far as the 'line in the sand', I'm in favor of a yank/hide exposed to users, but reserving a true delete for Supermarket admins. I think that means I agree with @lamont-granquist :+1:

lamont-granquist commented 8 years ago

Problem is that CI/CD and TK may be considered "production services" and taking them down may be unacceptable and they may depend directly on supermarket cookbook availability.

martinb3 commented 8 years ago

I'd think if CI/CD caught an upstream issue (issue = missing cookbook), they're working as intended, and catching a breaking change. I don't expect upstream to ensure my own CI/CD pipeline is working. I'll leave it alone, since I think we agree on the details, but I feel like this is probably an interesting question worth exploring overall in the scoping & constraints of Supermarket as it grows/changes/evolves.

lamont-granquist commented 8 years ago

I've witnessed people being fired over stopping the developer workflow inside of a company.

vinyar commented 8 years ago

Bumping this as we have users running into this.

Currently both public and private Chef server allows knife cookbook delete COOKBOOK VERSION (options)

Expectation: Supermarket (I'm only talking about on-prem) should offer similar functionality - ability to delete a specific version of the cookbook

Also, to @nellshamrell list of points, I do not think deleted version should be blocked from reuse due to use case below

Use case: Private supermarket living behind a firewall is kept in sync with supermarket.chef.io. Somehow during the sync, last version of every public cookbook is botched. Now, internal supermarket has no way of rolling the version forward or overwriting it, short of nuking every corrupted cookbook to remove the last version.

(Pending testing command in 1st comment against latest version of supermarket.)

JackChance commented 8 years ago

+1, at least for onprem supermarket. Not having control over my own supermarket is frustrating, especially when someone makes a mistake and the fix is either version bump it and all of its contingent cookbooks or simply undo the version and release the fixed version.

Also, the documentation on https://docs.chef.io/plugin_knife_supermarket.html#unshare does not work. Simply errors out when I try to remove a version.

lamont-granquist commented 8 years ago

again 'hide' handles the 'oops this is a bad version do not use it' use case, and does not have the sharp edges of 'delete'.

danielsdeleo commented 8 years ago

Yeah, you will need the nuclear option of really deleting a cookbook, for cases like putting credentials in the code, but that's about it and it should be restricted to a few admins. Deleting the artifact because it had a bug just causes a ton of extra pain.

lamont-granquist commented 7 years ago

Good summary on why cargo yank does what it does and does not delete: http://edunham.net/2016/03/24/could_rust_have_a_left_pad_incident.html

robbkidd commented 7 years ago

Noting that it was Deciderated in RFC72 Artifact Yanking how Supermarket should proceed with artifact removal/hiding. I'll move this issue up in the queue for us to consider next week for the next round of work.

voroniys commented 5 years ago

OK, cookbook removal is disallowed now, but I don't see a command for hiding versions.:

** COOKBOOK SITE COMMANDS **
knife cookbook site download COOKBOOK [VERSION] (options)
knife cookbook site install COOKBOOK [VERSION] (options)
knife cookbook site list (options)
knife cookbook site search QUERY (options)
knife cookbook site share COOKBOOK [CATEGORY] (options)
knife cookbook site show COOKBOOK [VERSION] (options)
knife cookbook site unshare COOKBOOK (options)

So, how the version could be hidden if needed?

robbkidd commented 5 years ago

@voroniys Hiding cookbooks as described by the RFC has not been implemented. It is not clear at the moment when the feature will be available.