Open dhoer opened 7 years ago
Choco is touching the original source. Why not cache the download urls? Choco already has a convention to require url, checksum and checksum type for x86 and/or x64 installs, so it could be possible to do this.
No convention here - it's requirement due to distribution rights. Keep in mind that the community package repository is but one Chocolatey repository in a sea of thousands. Even with 5K packages, it is a tip of the iceberg in packages. The rest are all internal, and that represents a much larger portion of packaging. The actual convention is to embed the binaries directly in the package for the utmost in reliability. We are adding this to choco new
in 0.10.8 so that folks better understand the conventions - https://gist.github.com/ferventcoder/dac662b6ae05f93ff22e4a093dbb56d0#file-_todo-txt
To give you a better grasp of what I mean by tip of the iceberg, https://www.slideshare.net/ferventcoder/webinar-chocolatey-package-management-with-proget/18
Why not cache the download urls?
Chocolatey already does this. That's what cacheLocation is. Perhaps it's best to read over https://stackoverflow.com/a/18596173/18475 to get a good understanding of options available.
I might be a little slow here. So cache can be configured to point to a nexus hosted repo? It looks like it is for a local server where install is occurring. I know I can cache the choco package on nexus by proxying chocolatey.org, but I don't understand how to configure choco cache to use nexus.
And I understand that some packages like jdk8 may not be cacheable due to url/url64 not being used (this is because of Oracle requiring cookies to be set in order to download). And maybe those have to be internalized? But following 80/20 rule, I'm sure most installers are using the url/url64 settings.
I take it this is where cache logic is: Get-PackageCacheLocation. If I knew powershell and windows I would submit a PR to add this functionality, but I struggled just to write a config file.
But wow, this would be nice to have. If this existed, I would probably add "Chocolatey Cache" hosted repo on nexus and push to it when there is a cache miss.
The cache would be for internal use. I wouldn't expect chocolatey.org to have a cache due to distribution rights. But this shouldn't be an issue for private internal caches.
I'm going to cherry-pick a few things I think I may answer.
So cache can be configured to point to a nexus hosted repo? It looks like it is for a local server where install is occurring
Not with how the existing caching works (AFAIK), the current caching is for previously downloaded executables/archives which are then stored on the users local computer (which is then checked, and if all checks don't report any failures the previously downloaded executable/archive is used).
Anyways, why not add that nexus hosted repo as an additional source? choco source add -n=nexus -s="https://where.i.am.located/api"
(a file path can also be used)
And I understand that some packages like jdk8 may not be cacheable due to url/url64 not being used
If the package is using the built-in chocolatey download helper it's cachable (which I believe it does).
And maybe those have to be internalized?
That should be done with most packages, as long as you can do exactly that.
I take it this is where cache logic is: Get-PackageCacheLocation.
No, sorry. That is a helper function for packages to get the cache location to use when downloading files that need extra care, it's not meant to be used outside of packages.
I wouldn't expect chocolatey.org to have a cache due to distribution rights.
It kinda do though (in a way), for all licensed products of chocolatey, a private CDN is used (can be used) to download the external files instead of downloading them directly from the original location.
@AdmiringWorm Thanks for the feedback.
I'm not sure what you mean by this:
Anyways, why not add that nexus hosted repo as an additional source? choco source add -n=nexus -s="https://where.i.am.located/api" (a file path can also be used)
I do disable chocolatey source and add in our nexus group repo as a source. But the point I was trying to make was that it would be nice to setup a global cache and point it to a nuget hosted repo on nexus and all cache misses would be automagically be pushed to the hosted repo, which would then make it visible to the group repository configured in source.
I wrote up how I configured Nexus here: https://stackoverflow.com/a/45871332/4548096
I might be a little slow here. So cache can be configured to point to a nexus hosted repo? It looks like it is for a local server where install is occurring.
For the local machine, each local machine. Some folks have attempted to set it to a share location that all machines could take advantage of but have found a race contention on creation of files there.
I just spent time refactoring a rather large windows farm with a centralized log server that had same issue. It's best to stay away from shared drives. AWS offers SSM which could possible manage the cache, but I think it would be best to stick with a nuget approach.
That is good to know, but it got me thinking that there is a code smell with this approach. Choco is touching the original source.
@dhoer No code smell, the original source is set that way due to non-redistribution. Using an external cache still has a failure point, the best and most reliable method of using Chocolatey is to ensure the package (fancy zip file) has everything in the package. This is what Package Internalizer provides. I would suggest looking closer at that functionality. https://chocolatey.org/docs/features-automatically-recompile-packages
IMHO it is. It requires someone or something to execute that intermediary step. It is not feasible for organizations that have many teams, many internal packages, and many more shared packages to try to ensure that intermediary step is done and done properly.
The workaround for this is to require all deployments that rely on choco installs from outside the organization have the installs be baked into an AMI at the beginning of the pipeline process. This ensures that there are no broken link issues during deployments.
@ferventcoder Look, chocolatey is a fantastic product. Thank you, thank you, thank you for building this. It was badly needed on the windows platform.
I don't want to sound rough, but I know enterprise software, and automatically-recompile-packages is not an enterprise solution. The cache method is an enterprise approach. With this approach, you could host chocolatey and guarantee that anything vetted in shared repo will have its downloads cached and not have to worry about it. This is something worth paying for. It might even open chocolatey up for partnerships with aws, gc and azure. So think big picture. Think enterprise. That is where the value is.
TBH none of the Enterprise-level customers we have use the community repo. Most don't internalize or need some sort of caching because at a true Enterprise level they are building their own packages already and have staff for this level of support.
However you do have some good points on caching/stashing - one point for clarification though - how is the stash not something you would need to run more than once? You mentioned
It requires someone or something to execute that intermediary step.
I'm trying to resolve in my mind how stash would not be considered the same.
The problem with a cache is that it is not deterministic - internalization is deterministic. I'm not sure why a non-deterministic feature would be considered enterprise-grade.
That said, we could consider a caching/stash feature but I'd need to understand how you foresee it as deterministic (and reliable). Making a package 100% offline and reliable is the goal for what internalizer does, would love to understand if the goal is something you would consider enterprise-grade or not and we can work from there.
deterministic cache
--force-cache
flag setservice offering
open source The community would be allowed to have a private cache as well, in my case it is a nexus hosted repo, but there is no guarantee that latest choco package from chocolatey.org's url will be available.
next step Develop business plan on how to enhance your service offering to cloud vendors and how using your product will benefit infrastructure management of windows platforms.
Other enhancements that would be nice:
Have simple why to run sanity checks e.g., choco verify
. Whether that is a pluggable test framework, you roll your own, or both. I like http://serverspec.org/ syntax, but not the ruby baggage that comes with it.
Make it easier to jump to developer's source code on their repo. The view on the package page is nice, but I wanted to do a PR on someone's code and it was a pain to figure out where their source code was since there was no links that didn't point back to chocolatey.org.
Lastly, it would be nice allow for official packages like hub.docker.com does.
service offering
- since you own chocolatey.org, anything uploaded that contains url/url64 must be able to download object successfully
- behind the scenes you cache the download object and make it available internally upon package approval
We do this now already. Let's talk about how this is alike or different from our CDN cache - https://chocolatey.org/docs/features-private-cdn
With the download CDN cache, we already do this for folks using the community repository. The download CDN cache is targeted more at our Pro customers (individuals looking for more reliability with the community repo) and MSP customers (low interaction with keeping things up to date, open to placing more trust in a community).
When it comes to Enterprise customers and security conscious customers, they just are not going to reach out to internet resources at all. We've had conversations with hundreds of organizations, big and small, and the preferred use of Chocolatey is completely internal so they can have a trusted, repeatable process. That even includes reaching out to chocolatey.org at runtime, they just are not going to do it. We understand this. We've understood this for years, it's what has shaped our current offerings.
your private service offering would guarantee access to that internal cache, thus allowing customer to focus on their internal package and not worry internalizing 3rd party packages
Most folks use internalizer with Jenkins job(s), and it's pretty much hands off. So for most, it is set it and forget it and you get the benefit of fully internalized packages with little effort.
And that's saying that a customer is even going to reuse package logic from the community repository. Some are just right clicking on those executable installers and MSIs and selecting "Create Chocolatey Package" and they have a fully ready to go software deployment package in about 5 seconds. Pointing Package Builder to an archive of installers that an organization has would allow them to automate all of their software deployments very, very quickly.
deterministic cache
- package must use url and/or url64; otherwise, warn that not-cacheable or maybe fail if --force-cache flag set
That could be a good addition to our download cdn cache feature
- hash the url with SHA-1
We use SHA512 I believe, SHA1 has been broken.
strip off first 8 chars and append to nuget package (name-version-8charHash) ask repo for package name-version-8charHash cache hit - download nuget package name-version-8charHash, verify checksum cache miss - download url, verify checksum, push to cache repo as name-version-8charHash continue with package installer...
Implementation details here, we are already doing this functionality for the community repo. There are some ideas here on what we can offer, and also legal understandings on reoffering our CDN for internal use we would need to ensure, but it could be a nice feature to offer.
I recommended sha1 because this is not for security: https://stackoverflow.com/a/28792805/4548096
Some advice; don't limit chocolatey to "Package Manager for Windows" paradigm. Move to "Infrastructure Services for Windows" paradigm. This opens up chocolatey to more opportunities like security. Think if choco services were on aws and a sev 10 vulnerability was issued on a package. It would be valuable if a chocolatey service sent out alert and said what instances were vulnerable. And maybe choco had a config management service that allowed for scheduling updates to those instance. These types of services will make you a millionaire multiple times over. Btw, be sure to cut me a fat check when that happens.
Adding a simple cache for downloads and security for params is easy for someone to come along and implement. Having services mentioned above is not. I would make the cache and param security pieces freely available and focus on infrastructure services.
Some advice; don't limit chocolatey to "Package Manager for Windows" paradigm. Move to "Infrastructure Services for Windows" paradigm.
@dhoer it's not limited to that paradigm. You really should learn more about "Complete Software Management for Windows"
And maybe choco had a config management service that allowed for scheduling updates to those instance.
That's the Chocolatey Central Console. Have you been to https://chocolatey.org/pricing#compare?
@dhoer it's not limited to that paradigm. You really should learn more about "Complete Software Management for Windows"
Yeah, I don't know what that is or how to google it.
That's the Chocolatey Central Console. Have you been to https://chocolatey.org/pricing#compare?
Nice!
We do this now already. Let's talk about how this is alike or different from our CDN cache - https://chocolatey.org/docs/features-private-cdn
Not sure how the cdn piece works. Does that require internalizing package first? If so, then cache makes internalizing step unnecessary for packages with url and url64 defined.
The internalizing step would require a build job for each package in our shop, since build servers are the only ones with keys to push to a repository. If multiple teams use the same package, who is the owner of the build? Who is allowed to update it? All this headache goes away with cache because this becomes a non-issue.
Not sure how the cdn piece works. Does that require internalizing package first? If so, then cache makes internalizing step unnecessary for packages with url and url64 defined.
Nope. It just works when you install packages from the community repository.
If multiple teams use the same package, who is the owner of the build? Who is allowed to update it? All this headache goes away with cache because this becomes a non-issue.
I feel like we keep going back and forth on semantics - who owns the cache? Who is allowed to update it?
And when you say teams, are you talking about development teams or ops teams?
This feels like a discussion we should have in person somewhere, and then capture the results in an issue.
I will be in San Fran this week if you want meet for coffee or something.
I'm nowhere close to that area. :D
Bottom line; If a cache was built similar to what is posted at the top but with force-cache feature added, then the steps of ensuring everything from choco is cached in house could be done in these 4 lines:
choco source disable -n=chocolatey
choco source add -n=choco-all -s "'http://repo.example.com/nexus/service/local/nuget/choco-all/'"
choco cache add -n=choco-cache -k "'redacted'" -s "'http://repo.example.com/nexus/service/local/nuget/choco-cache/'"
choco feature enable -n=forceCache
This would be enforced on build servers. The first 2 lines would be recommended practice for developers, but if they didn't do it, not an issue.
Force cache is the deterministic bit I was missing earlier.
And there is still the piece about getting new items automagically updated in the cache when they become available.
And there is still the piece about getting new items automagically updated in the cache when they become available.
When cache feature is implemented, it should happen automatically. When choco install or upgrade is called on a package hosted on chocolatey.org that uses url/url64, it uses choco-all source defined above to determine a cache hit/miss on download url and caches on miss by pushing to choco-cache.
Private internal repos (like our nexus) would run the risk that the download url is no longer available when it tries to cache it, but the paid for private repo service hosted by chocolatey shouldn't since it would have cached it during the approval process.
Private internal repos (like our nexus) would run the risk that the download url is no longer available when it tries to cache it, but the paid for private repo service hosted by chocolatey shouldn't since it would have cached it during the approval process.
One clarification that is necessary here - organizational features make perfect sense for C4B, but not always for open source. One of the benchmarks for determining where a feature falls is whether an open source user (not an organization) would find value in a feature. A user already has a cache that gets built locally automatically when they are installing packages. The good news for you is that this feature does have value, but not necessarily in open source.
I'm not going to get time to open source this feature. ☹️
But I did have a few questions about how this will be implemented:
@ferventcoder Are you still planning to roll this out 0.10.11? When we pay for the cacheing service, will we still be able to have an internal cache? CDNs are great and all, but I have been burnt in the past by CDN misconfigurations. So it would be nice to manage the cache in-house on our Nexus artifact repo since we have total control over it.
@ferventcoder thanks for your response on https://stackoverflow.com/questions/45867716/nexus-to-serve-up-chocolately-packages about recompile packages. That is good to know, but it got me thinking that there is a code smell with this approach. Choco is touching the original source. Why not cache the download urls? Choco already has a convention to require url, checksum and checksum type for x86 and/or x64 installs, so it could be possible to do this. This approach doesn't molest the original choco package.
Since the word cache is already used, let's call this
stash
for the purpose of this conversation. Feel free to change it.Stash will store objects downloaded from urls defined in choco package. The stash will be in nuget format with url hashed and appended after version (I don't know nuget but I think that should work, if not, then you get the general gist).
Stash Command
choco stash [list]|add|remove|disable|enable [<options/switches>]
Examples
Options/Switches
Choco install/upgrade
Options/Switches
Algorythm
This seems like a cleaner approach, but I don't know all the ins and outs like you do.