Feature: site :cookbooks

hedgehog commented 12 years ago

Would it be possible to support the cookbooks collection on Github? You'd likely be accessing via the Github API, e.g checking availability etc.

I've recently (yesterday) made the first run at tracking repo specific cookbooks (some of heavywater's cookbooks). Hopefully over time cookbooks will evolve into a collection that:

live on when whoever is upstream drops them.
extend beyond opscode (currently only heavywater and miketheman (ruby_enterprise) are tracked).
compatible irrespective of who is upstream (the qa branch)
track a changes in who is considered upstream.

Of course this is a classic chicken-and-the-egg problem: lots of people won't use this collection until it is easy to use, the collection will be easy to use when bugs are ironed out by lots of people using them.

Librraian support for site :cookbooks would at least make it easy for people to consider using. If this is possble I'd remark that the branch qa should be considered the source branch. The master branch being reserved for upstream.

Thoughts?

yfeldblum commented 12 years ago

@hedgehog,

No Special Privilege.

One thing I do not want to do is give special consideration to any specific source. For example, there is no special consideration for the Opscode community site. Indeed, any Cheffile that wants to pull cookbooks from the Opscode community site is required to plug in the URL endpoint of the Opscode community site at the top. The API is just 3 routes, and can be implemented with a simple Sinatra application; indeed, it could probably be implemented with some static files behind a web server - and Librarian-Chef will support it.

Likewise, I do not want to give special consideration to any particular GitHub organization. Indeed, I do not want to give special consideration to GitHub as a website. If there were to be a good way to use the cookbooks organization from GitHub from Librarian-Chef, then it would have to be made generic across multiple git hosting providers. The pertinent questions are: is this possible, is this easy, and what should it look like? How can this be done without treating the cookbooks organization as special, and indeed without treating GitHub as special? Additionally, the consideration you raised about treating the qa branch as special for the repositories in the cookbooks organization is not a generic consideration; how can this be done so that things are generic and no-one has special considerations but so that we can also use the cookbooks repository very easily?

One specific consideration is that right now Librarian treats the master branch as the default branch, but does not ask the repository what the correct default branch is: one specific item to be changed is that now we will need to ask what the default branch is, and by default to use that rather than master.

yfeldblum commented 12 years ago

@hedgehog,

Please don't take the long reply as a "no". It's a "no" for special privileges for any particular source, but it's a "yes" for figuring out a good but general way for making this work. To summarize, the question is: What is the general way to do it that at the same time makes using the qa branch on the cookbooks organization repositories as a common source really easy?

hedgehog commented 12 years ago

Sounds good. I wasn't suggesting special consideration, rather what you suggest, allow for a wider set of sources. "support the cookbooks collection on Github" isn't a plea for exclusivity ;) just the opposite - a request for admission.

How generic the site ... method ends up will depend on implementation details. site :xyz is of course meaningless and would have to be an alias to some defaults for a more general interface, it should be trivial to expose that detail in place of the alias :xyz, e.g as a hash. The implementation may not be trivial, and may need baking time :) I only mentioned the qa branch since most seem to assume the branch of interest is always master.

I have some more generic ideas for deployment than just git from github. But will post those in a separate issue.

Thanks for the consideration.

yfeldblum commented 12 years ago

@hedgehog,

The site source is reserved for HTTP endpoints that look like the Opscode Community Site API. It could be any Sinatra application with the 3 expected routes or it could be static files served by Apache.

The git source is reserved for Git repositories.

Perhaps a git_site source?

git_site "https://github.com/cookbooks"

The semantics could be: the actual git URL is #{remote}/#{name}.git by default. So

git_site => "https://github.com/cookbooks" # set the default source
cookbook "apache2" # use the default source

would translate to a git URL of https://github.com/cookbooks/apache2.git.

With this example, though, you would still have to specifically set the default branch to qa on each repository in the GitHub cookbooks organization.

hedgehog commented 12 years ago

The site source is reserved for HTTP endpoints that look like the Opscode Community Site API.

Yup, that is what I thought/feared.
Hence the plea to be admitted to that exclusive HTTP club ;) Ack. I like to dress appropriately, but differently ... i.e HTTP, but not the Opscde API (too limiting)

Doesn't have to be git, in fact the more I mull my ideas the less I lean toward git in the first instance - maybe in a later iteration. I'm holding off on opening an issue describing the proposal simply because the more I contemplate it the more it changes, and simplifies.

yfeldblum commented 12 years ago

What do you find too limiting about the Opscode API?

hedgehog commented 12 years ago

Maybe unnecessary would have been a better word:

git protocol for secured access, or for when you need the history, i.e. use the --preserve-git option mentioned in issue #38
http protocol for public cookbooks, plain HTTP GET, no 3rd party API icing.

This is the idea mentioned in the comment:
Store the tagged version of a cookbook (i.e. no git history) as a zipped file on a CDN like the AWS cloudfront. This way in production settings, if you can reach the web you can get your cookbooks. Not 100% foolproof, but much better than relying on github or whichsever git server is the single point of failure.

Of course if you can't reach any of the CDN edges, very likely no one can reach your site.

Naturally, secure networks will either have their own git server, or an http server they can point to in their Cheffile.

This of course is for the future, much like the single repo idea was before its time ;) That said, I think it is woth bearing in mind

yfeldblum commented 12 years ago

Just some thoughts.

What do you mean by getting cookbooks in production? Is this for production nodes running chef-solo rather than chef-client, which therefore need a reliable (non chef-server) way to fetch cookbooks?
The portion of the Opscode Community Site API that Librarian-Chef actually uses is really small, and can be done with some small static JSON files built with a simple build step plus the cookbook tarballs and can be hosted on e.g. CloudFront or an internal Apache server.
For locked-down git access, accepted practice would be to use the user's keys in ~/.ssh or deploy keys given directly in the Cheffile (not implemented - could use some help writing tests for deploy keys).
You could probably vendor everything in cookbooks/ while adding cookbooks/*/.git to your .gitignore. You'd still have the cookbooks/*/.git directories locally (you could have a script to clean them out locally) but they wouldn't propagate to source control or to production.
Just HTTP for public cookbook tarballs is not enough because it provides only one version of each cookbook and doesn't provide a list of all the known versions of each cookbook. It would also be slower because it doesn't provide metadata (versions & dependencies) up front - you have to download the tarball to inspect it. The latter is currently an issue with the current Opscode Community Site API too, and it would be much better if the API delivered the complete JSON metadata for each cookbook, at least the name, version, & dependencies (including conflicts, replaces, suggests, and recommends) metadata, without requiring downloading the tarball.
Not terribly interested in building a system in the Cheffile for saying where the original source of a cookbook is, but the cache location in production is actually going to be somewhere else. The primary reason is that the real source and the cache source can easily get out of sync, voiding Librarian-Chef's purpose. Much simpler to say every cookbook has exactly one source, no matter who is running librarian-chef or where. Maybe if Bundler does it and it turns out to work really well. In the interim, options include: (a) vendor the contents of cookbooks/ while gitignoring the embedded git directories, or (b) have a build step which runs on a build server, uses librarian-chef to fetch the cookbooks into cookbooks, and packages up the results into a tarball, and deploy that built & packaged-up project tarball to nodes, rather than directly deploying the repo source to nodes.

hedgehog commented 12 years ago

Yes, any config/setup that falls back to Chef-Solo
Yes, but when they change it we have one more (avoidable) task :) I was thinking even simpler. Just a URI, that Librarian GETs. To dry the code out it may be neat to have a common URI root that a cookbook' ref gets appended to. Different peoples prod would be using different Git tags, so we'll need that. So adopt a convention to eliminate some configuration. Most succinct convention? <vendor>/<cookbook>_<tag>.tgz. Hence, writing the code you wish you had, the Cheffile entry might look like:
```
site 'http://cookbooks.io/' # www.cookbooks.io might be a human UI that different vendors can sponsor by picking up the hosting tab?
cookbook 'rbenv', :vendor => 'hw', :tag => 'v0.4a'
cookbook 'rsyslog', :vendor => 'oc', :tag => '1.1'
cookbook 'ruby_build', :git => 'https://github.com/fnichol/chef-ruby_build',
:ref => '05454b507d'
```
Yes, plain ssh/git of secured access.
Maybe make the convention: No .git downloaded via HTTP, i.e. the <vendor>/<cookbook>_<tag>.tgz is a git archive (from memory). If you use site 'git://.., then the keep/delete /.git logic kicks in.
See the vendor and tagging convention above. Since http:// targets production use cases the-convention-is/we-assume that the devop people have cited the correct vendor and tag combinations that work, so Librarian can a parallel download (fast) of the *.tgz files, unpack them, then resolve child dependencies. If the tarball is a flat archive (no '.git` history) then grabbing all the HTTP cookbooks in a chef file should not be much longer than the time to download the largest. I think the cookbook metadata should be kept out of the scope of cookbook retrieval. Once the cookbooks have been retrieved then the metadata moves into Librarian's scope.
I didn't mean to create cache tracking feature. Just a very simple system and convention:
- If you are in production, or using a finalized cookbook, use the syntax cookbook 'rsyslog', :vendor => 'oc', :tag => '1.1' and you'll get such cookbooks (fast) from the location site 'http://<uri>'. You can still point to a git server in the same Cheffile (as above).

Agree a cookbook has only one source. No real/cache sources. Apologies for the confusion. Bundler+Chef has been history for some time. Now I've experienced Chef's needs, esp in production: Bundler can't work. Period.

hedgehog commented 12 years ago

Just to clarify on the motivation for support tag archives over http, in addition to git repositories: chef-client, tag=1.1.0:

zip: 24460 bytes
raw: 49029 bytes
git bundle: 6.0M
git unpacked: 6.7.M

In production settings that can make quite a difference, and the differential will increase over time.... even if you do not change the revision you use in production, your git download will continue to increase.

taqtiqa-mark commented 12 years ago

@yfeldblum, I might take a stab at this. Re 2.) above. Can you say how tricky it would be to write a source class that doesn't use the Opscode API? That is what is the minimal JSON you mentioned, and is there an alt. route to providing/setting this data in Librarian?

yfeldblum commented 12 years ago

You can look at the site source. You would want to implement the same public class and instance methods, but with a new implementation for git.

You can also look at the librarian git source vs the librarian-chef git source as an example of separating the abstract part of it from the part of it tied specifically to chef and cokbooks.

To wire it in, you would also need to add a line to lib/librarian/chef/dsl.rb.

taqtiqa-mark commented 12 years ago

Thanks for the prompt response. I'm not going to tackle git right now. Rather try implement a source using simpler/plain HTTP convention. I'd looked over the existing site source and the specs. From the specs it seems librarian only requires version be in the json file, name being available in the Cheffile. Version info is provided in metadata files. So it seems the pre-existing metadata.rb could substitute for the json file? However, is it the case that no other part of librarian relies on the json file contents? i.e. what is the role of the cached json? Sorry for not being clearer.

yfeldblum commented 12 years ago

The chef-site source loads chef to parse the metadata.rb file, if there's a metadata.rb and no metadata.json.

There are many cached json files in the chef-site source, but these are the API responses from the Opscode Cookbooks API and don't have anything per se to do with cookbooks directly.

applicationsonline / librarian

Feature: site :cookbooks #40