How can I vendor dependencies (shards)?

GrantBirki commented 5 months ago

A common design pattern for production applications is the ability to "vendor" dependencies. This means committing them to version control (git). You cannot have truly reliable builds without vendoring dependencies because a GitHub repository could be deleted at any point in time.

If a repo is deleted that contains a shard, then I can no longer pull that dependency to build an executable and thus my entire build process has fallen apart.

How can shards be used to vendor and check-in all dependencies?

I know that once could do something like this:

#!/bin/bash

set -e

# set the working directory to the root of the project
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && cd .. && pwd )"

export SHARDS_CACHE_PATH="$DIR/vendor/cache/shards"

echo "Installing shards from vendored cache"
echo "SHARDS_CACHE_PATH: $SHARDS_CACHE_PATH"

shards install --local

but this doesn't do exactly what I'm after. It does a great job at caching locally, but when I run this in my GitHub Actions CI flow, it breaks.

I come from the land of Ruby where vendoring dependencies (Gem) is quite common and it is done so with ease through bundler. Here is a dead simple Ruby project that is open source where I vendor my dependencies with bundler.

I'm hoping the same can be accomplished with Crystal and Shards! 🙇

straight-shoota commented 5 months ago

It should be as simple as commiting ./lib into your repository. All dependencies are checked out into worktrees inside ./lib.

By default, this folder is excluded via .gitignore. But you're free to change that. Keeping vendored dependencies available seems like a good use case for doing this 👍

GrantBirki commented 5 months ago

@straight-shoota are there any potential downsides to doing this? I know the ./lib folder can be quite large as it looks like it contains entire repositories for me (in a few projects).

straight-shoota commented 5 months ago

I don't see any downsides. The lib folder should only contain checked-out worktrees, not the entire repo history. There's no .git folder.

If you see something else, that would be odd. More details, please =)

GrantBirki commented 5 months ago

Thanks! That should be enough info for me to use for vendoring. I'll re-open this issue if I have any troubles.

GrantBirki commented 2 months ago

👋 Hey again @straight-shoota! I had an additional question around vendoring dependencies.

So to truly "vendor" a dependency, it would mean that everything related to that dependency lives within the repo.

For example, if I had a shard that made API requests to GitHub and it was a runtime dependency of my app, I would want to ensure that it is always available and no network calls have to be made at all to pull down, build, or compile that shard.

So my question is, in addition to vendoring (checking into git) the ./lib folder, I'm wondering if it is also safe to check in the shards cache path within my project directory as well.

Example: SHARDS_CACHE_PATH="$DIR/.cache/shards" where $DIR is my project directory.

If I can safely check in the .cache/shards/ dir, then I can run shards install --frozen --local (with the local flag) and no network requests are made for my dependencies (yay!).

Is this a safe thing to do though (I'm still new to crystal )? I do my development on linux locally and my target systems that I deploy to are also linux in production. Would this potentially cause any conflict if I had a team member that was doing development on a MacOS machine?

I want to effectively remove all occurrences of:

Fetching https://github.com/kemalcr/kemal.git
Fetching https://github.com/stefanwille/crystal-redis.git
Fetching https://github.com/crystal-ameba/ameba.git
Fetching https://github.com/luislavena/radix.git
Fetching https://github.com/crystal-loot/exception_page.git
Fetching https://github.com/sija/backtracer.cr.git
Fetching https://github.com/ysbaddaden/pool.git
...

and instead have them all be:

Using radix (0.4.1)
Using backtracer (1.2.2)
Using exception_page (0.4.1)
Using kemal (1.5.0)
Using pool (0.2.4)
Using redis (2.9.1)
Using ameba (1.6.1)
Dependencies are satisfied
...

when running shards install --local

My overall goal is to have builds be extremely repeatable, reliable, not rely on the network, not fail if a rogue shard maintainer deletes their repo, and work well in CI systems across multiple platforms.

Thanks! 🙇

straight-shoota commented 2 months ago

Yah I think this should technically be fine. The shards cache just contains bare repositories which should be Cross-Platform compatible. This of course depends on the resolver (such as git). All current resolvers should work that way.

I don't see the point of this though. Why do you want to run a cached shards install when you can just have the dependencies already checked out in lib? Without network requests you won't get anything new anyway.

The only reason I could think of where this might perhaps be useful ist to seamlessly switch between different dependency versions during development.

GrantBirki commented 2 months ago

I don't see the point of this though. Why do you want to run a cached shards install when you can just have the dependencies already checked out in lib?

Even when I have everything in lib/ checked into version control, running shards install still indicates its making network requests with Fetching https://github.com/kemalcr/kemal.git. So its clear that in order to have completely repeatable builds that don't rely on the network at all, that you need to have all three conditions met (AFAIK):

All dependencies must be checked into version control (in my case, the lib/ dir)
The cache dir must be checked into version control (in my case, the .cache/shards/ dir)
The shards install command must be run like so: shards install --frozen --local
Then, no network requests will be made for shard/dependency installation during critical production builds taking place in CI

Does this all make sense and would it be fairly safe to do @straight-shoota? Thanks again for helping out a new user here! 🙇

straight-shoota commented 2 months ago

I don't get why you even want to run shards install in the first place. How about just not doing that? You already have your dependencies installed in lib/ and explicitly not intention to change anything about that. So why go through all of that just to get effectively a no-op?

I figure it should be fine to do, but I don't see any sense in it.

GrantBirki commented 2 months ago

I don't get why you even want to run shards install in the first place. How about just not doing that?

Locally, (on my dev box) this would be fine. But in CI systems (like GitHub Actions) I'm not committing my binaries so they need to be built during a deployment or for testing.

For example, during my lint CI job, I would need the ameba binary to exist and the only way to make that exist without committing binaries would be to run shards install and then have ./bin/ameba ready to go for my CI job responsible for linting.

Vendoring is what I want here, because if the maintainers of ameba were to just delete the project, I wouldn't be able to run CI jobs for linting. Yeah, that isn't too bad but if the same were to happen for kemal, then I would lose the ability to build or deploy my service (very bad).

So I'm trying to iron out the best way to fully bootstrap, install dependencies, test, lint, and then deploy my service with zero reliance on the network or 3rd party shards existing. If I vendor a dependency even a single time, I expect it to be usable in my project forever, regardless if the source repo even exists or not.

This is following the pattern of "who owns my availability?" and the answer is always you.

I'm trying to figure out how to build extremely reliable and fault tolerant systems and I feel so close with crystal but not fully there yet.

Blacksmoke16 commented 2 months ago

If your primary concern around all this is the small chance your dependencies are deleted upstream, might just be easier to fork them and update your shard.yml to your own forks. Then you have a copy of the code regardless of what happens upstream, and can occasionally sync the mirror to get latest changes and such.

GrantBirki commented 2 months ago

For reference, here is the repository that I am working on which will be a "base" repo template for many of my projects going forward and hopefully for others to use as well with crystal

https://github.com/GrantBirki/crystal-base-template

GrantBirki commented 2 months ago

If your primary concern around all this is the small chance your dependencies are deleted upstream, might just be easier to fork them and update your shard.yml to your own forks

Yep that is absolutely an option. I know many large organizations do this anyways with mission critical dependencies to avoid them bringing down their build systems. However, that requires a lot of maintenance and certainly isn't for everyone.

What I was ultimately hoping for with crystal was the phenomenal dependency management that comes with Ruby. I mean its just outstanding and works so well. You can vendor all of your Ruby Gems into the vendor/cache dir, and each dependency is just a fancy zip file that is all packed up. Running bundle update updates it for you and you can commit them with ease. New users simply just run script/bootstrap and all their gems are installed on a per project basis. The only time you ever reach out over the network is when doing bundle update to pull down new versions from RubyGems.

In my mind, this is the absolute dream of dependency management and I am really hoping I can nail down something similar here with crystal. ⭐

ysbaddaden commented 2 months ago

I understand the reasoning:

Committing Shards' cache means committing a git mirror of each dependency, some may not even be needed anymore (oops).

Committing the lib folder means that we must filter out anything non portable manually (libraries, executables). Even if they have a proper .gitignore we may not use Git but Mercurial. It also means that we must check which dependency build something to write a bootstrap script, then maintain that script on each update... something that could be automated.

I suppose there could be a pair of commands (or one command with two subcommands) that would:

vendor each shard in their pristine state (straight extract from their source);
install each shard from the vendored source, instead of their original source, but otherwise installing them normally.

straight-shoota commented 2 months ago

I was not asking for reasons for vendoring dependencies. I get that you want that. I'm wondering why you need to run shards install when you already have the dependencies available and don't actually have to install them.

If I understand correctly, you rely on the postinstall hook of shards install to build ameba (and possibly other binaries?) for you. That's seems to be the only thing you need, not the full shards install. It has been proposed to have a separate command to trigger only the postinstall action. But this entire hook situation isn't entirely satisfactory and is under discussion. So I wouldn't expect a quick implementation of such a command when the bigger picture isn't entirely clear.

However, it's also pretty easy to explicitly build the binaries you need. For ameba, you can run this command in the root repository: make -C lib/ameba build BUILD_TARGET=$(pwd)/bin/ameba

Alternatively, it should be pretty straightforward to implement the behaviour of triggering the postinstall hooks as a script outside of shards itself. Something like this should do (I haven't tested it, but it should explain the necessary steps).

#! /bin/bash
O=${O:-"$(pwd)/bin"}
for spec in lib/*/shard.yml; do
  cd $(dirname $spec)
  eval $(yq '.scripts.postinstall' shard.yml || continue)
  yq -e '.executables[]' shard.yml | while read -r line; do ln -s bin/$line $O; done
done

straight-shoota commented 2 months ago

@ysbaddaden shards install --skip-postinstall --skip-executables should check out the raw repository contents which means a pristine state that you can check in with no system-specific artifacts. Any changes afterwards to these paths could be ignored and cleaned (until the next time you update the dependencies).

ysbaddaden commented 2 months ago

Pro: that would just work today (nice).

Con: Upgrading dependencies becomes tedious. You can't just upgrade, try things out, and commit when it's working. You must destroy your lib folder, reinstall the shards with the above command, commit, then rebuild everything, update your gitignore if needed, ...

You also have to copy (& maintain) the script to each and every project where you want to vendor your dependencies. Shards doesn't do it (yet?) but postinstall scripts may have to be built in a specific order, which a script won't be able to notice.

Prior art

Rust Cargo has the vendor command: https://doc.rust-lang.org/cargo/commands/cargo-vendor.html
Ruby Bundler has the cache command: https://bundler.io/v2.5/man/bundle-cache.1.html
Go used to always vendor, now it has -no-vendor and -vendor-only: https://pkg.go.dev/github.com/golang/dep/cmd/dep

straight-shoota commented 2 months ago

This seems like it might be overcomplicating. If the dependencies have proper ignore patterns setup and maybe a recipe to clean up the build directory, upgrading should be pretty simple. There might be some bumps when using different version control systems, but that shouldn't be too much of a problem. The ignore file syntax is usally pretty similar and it's just a matter of making the patterns accessible. Many IDEs are perfectly capable of understanding different ignore file formats. So I think this should be easy to manage.

I wouldn't worry too much about postinstalls. Their purpose can easily be factored out of the shards install process. IMO it's a better approach to build dependencies in your build system anyway.

Sija commented 2 months ago

@straight-shoota To me what you're proposing is overcomplicated solution to a problem that other PMs are already handling since yrs back... I cannot imagine telling people that they have to cook up a Makefile recipe along with the custom process just to vendor the dependencies.

GrantBirki commented 2 months ago

I'm wondering why you need to run shards install when you already have the dependencies available and don't actually have to install them.

If I understand correctly, you rely on the postinstall hook of shards install to build ameba (and possibly other binaries?) for you. That's seems to be the only thing you need, not the full shards install

Yep so in the perfect work, I wouldn't really need to run shards install but... I need things to work with my project like ameba or in the future other binaries that get built. In my crystal-base-template project, I have a helper script called script/bootstrap which is intended to be run by users when they first clone the repo. This script sets up everything they would need to start contributing to the project. During this script, I run shards install with some custom flags. Just after doing a shards install, the script calls out to another script that I wrote called script/postinstall which does the exact logic you were talking about where I trigger some "postinstall hook logic outside of shards itself" as seen here. Currently, it only does this for ameba as I'm new with crystal and haven't needed this postinstall hook logic for any other dependencies outside of the basic linter for crystal.

GrantBirki commented 2 months ago

The thing that I am still struggling with the most is that even when doing hacky things, writing scripts, setting custom shards env vars, etc... I still cannot get shards to behave in a way that doesn't make requests over the network. In short, I haven't figured out a way yet at all to make shards/crystal completely isolated from network calls in a truly vendored fashion.

Note the "fetching" line in the screenshot above

This behavior can be easily replicated by cloning my crystal-base-template repo and running script/bootstrap

I know many other languages have vendoring totally dialed in and what I'm catching here is that its either like 95% implemented in crystal/shards or I'm doing something slightly wrong

ysbaddaden commented 2 months ago

@straight-shoota I half agree with you, there shouldn't be much need to care about the postinstall hook.

I wholeheartedly agree that the postinstall hook is hardly portable (Windows :disappointed:), but maybe the interpreter will help in the future, or maybe we should advocate to call $CRYSTAL run?
I believe Ameba shouldn't compile itself in a postinstall hook. In fact I'm convinced it musn't be installed as a dependency: it should be distributed and installed just like any other external tool; just like we install crystal, shards, watchexec or nodejs.
But I'm not convinced for shards that bind an external library and either vendor the library (hardly packaged) or need a C translation layer (because C++), or generate lib definitions from C headers using libclang, ... IMO this is an internal detail to each shard, and that was the initial purpose for the postinstall hook.

Pushing the responsibility to build something to users of this or that shard wouldn't be nice, and even worse for a nested dependency.

straight-shoota commented 2 months ago

@Sija https://github.com/crystal-lang/shards/issues/611#issuecomment-2070112449

I cannot imagine telling people that they have to cook up a Makefile recipe along with the custom process just to vendor the dependencies.

No you shouldn't need to setup a build system just for this purpose. But you'll probably need it anyway (more likely with increasing level of project complexity) at which point it's no extra step because it's already there.

@ysbaddaden https://github.com/crystal-lang/shards/issues/611#issuecomment-2070264061

But I'm not convinced for shards that bind an external library and either vendor the library (hardly packaged) or need a C translation layer (because C++), or generate lib definitions from C headers using libclang, ... IMO this is an internal detail to each shard, and that was the initial purpose for the postinstall hook.

In an ideal world, maybe. But if you build C libraries, it's quite likely you'll need a way to configure them, and make this available as part of your own build step. Maybe simple shim libs or the like can get away with a basic default configuration. But as soon as you need to derive the smallest bit from the default, an implicit build step based on postinstall hook isn't going to cut it (unless shards gets blown up to handle such config options). If you use a build system with explicit build dependencies however, this is a relatively easy task.

GrantBirki commented 2 months ago

@straight-shoota this might be asking a lot of you and if you just don't have the time I totally understand.

I'm wondering if it would be possible for you to check out this repo -> https://github.com/GrantBirki/crystal-base-template and either open a PR demonstrating how I could tweak something to vendor shards, or comment on what I'm doing wrong here that is preventing a fully network isolated vendoring strategy.

Once I finally nail down a solution (if possible), I would be more than willing to open a PR some where to document this type of vendoring strategy in detail to help out other folks!

If larger organizations, enterprises, or critical services are looking to adopt crystal, having a rock-solid vendoring strategy is 100% going to be a requirement. I would be happy to help out with making sure this is a paved path for the next person to come along 🙇

straight-shoota commented 2 months ago

@GrantBirki This template doesn't have many dependencies, with this scope it's quite simple. As @ysbaddaden mentioned in https://github.com/crystal-lang/shards/issues/611#issuecomment-2070264061, ameba is a development tool, not a library. It should be installed as a tool in the development environment and not as a source dependency via shards.

With this change, shards has no dependencies to install at all, so the source code is naturally 100% self contained.

Of course this is only a blank template. If you want to make any use of it, you'll need some shard dependencies and at some point they might need some platform-specific build artifacts. What exactly to do about them depends on the individual circumstances. But I think a good general strategy looks like this:

Always run shards install and shards update always with --skip-postinstall --skip-executables to prevent hooks from running. This ensures that shards only puts the raw sources.
Use a build system to explicitly build artifacts when needed. The dependencies should provide recipes for that (for example something like make -C lib/foobar libfoo). If they don't, maybe you can help with that (at least make a request).
Ideally, build artifacts out of tree to keep the folders in lib/ clean. (Out of tree for the dependencies, it can be in tree for the main project)

GrantBirki commented 2 months ago

@straight-shoota I have added a simple dependency and I'm now working on trying to get vendoring working properly. I still can't quite figure out where "cached shards" are installed. The command that I am running (script/update) sets the cache dir to .cache/shards/ in my project directory. However, it looks like those files are all just git metadata?

So when I run a subsequent script/bootstrap, it works on my local machine but not in my build system (GitHub Actions)

https://github.com/GrantBirki/crystal-base-template/issues/1 - tracking issue
https://github.com/GrantBirki/crystal-base-template/pull/2 - pull request where I'm trying to fix things

straight-shoota commented 2 months ago

.cache/shards doesn't seem to be checked into the repository, thus the CI runner cannot find it.

However for the setup which I'm suggesting, you don't need to vendor the entire shards cache. Just ./lib.

straight-shoota commented 2 months ago

I understand the dependency you added and all of its transitive dependencies are pure Crystal sources. They produce no platform-specific artifacts. So no additional steps are necessary.

You can simply install them via shards install and commit ./lib. No need to chache the shard repositories, or handle any postinstall or cleanup tasks.

GrantBirki commented 2 months ago

So if I don't commit the .cache/shards dir, my CI system fails with:

$ script/bootstrap
I: Resolving dependencies
E: Locked version 0.1.0+git.commit.eb37b8129dbcf3638[5](https://github.com/GrantBirki/crystal-base-template/actions/runs/8807823800/job/24175736167#step:5:6)50cf8fce705aa9533598fd for octokit was not found in git: https://github.com/grantbirki/octokit.cr.git.

But even if I do commit the .cache/shards dir, it still fails with the same error.

How in the heck do I get my CI system to just run script/bootstrap without needing the network?

reference PR where I'm working: https://github.com/GrantBirki/crystal-base-template/pull/2

I feel like I'm going nuts lol :joy:

GrantBirki commented 2 months ago

So in CI it fails with or without the SHARDS_CACHE_PATH being committed. In both cases, I have lib/ committed.

GrantBirki commented 2 months ago

So there has got to be a third component somewhere. If I vendor SHARDS_CACHE_PATH and lib/ it should have everything that CI needs to run my script/bootstrap script which is really just running shards install --skip-postinstall --skip-executables --local --frozen under the hood.

Perhaps there is another hidden directory somewhere that shards install looks at which I am missing? Maybe there are some files or git information cached somewhere else? Maybe these items are in a difference place for me because I'm using crenv?

straight-shoota commented 2 months ago

Hm, yeah I tested this locally. There seems to be something going wrong in shards install. It appears to be removing some of the cached repositories 😕

I staged .cache/shards, then ran script/bootstrape and git status shows this:

AD .cache/shards/github.com/crystal-ameba/ameba.git/HEAD
AD .cache/shards/github.com/crystal-ameba/ameba.git/config
AD .cache/shards/github.com/crystal-ameba/ameba.git/hooks/post-rewrite
AD .cache/shards/github.com/crystal-ameba/ameba.git/hooks/pre-rebase
AD .cache/shards/github.com/crystal-ameba/ameba.git/objects/pack/pack-b8c5dc20a55568868038dacf844dffa567e01f9c.idx
AD .cache/shards/github.com/crystal-ameba/ameba.git/objects/pack/pack-b8c5dc20a55568868038dacf844dffa567e01f9c.pack
AD .cache/shards/github.com/crystal-ameba/ameba.git/objects/pack/pack-b8c5dc20a55568868038dacf844dffa567e01f9c.rev
AD .cache/shards/github.com/crystal-ameba/ameba.git/packed-refs
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/FETCH_HEAD
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/HEAD
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/config
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/hooks/post-rewrite
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/hooks/pre-rebase
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/objects/pack/pack-8e20be7d11be78befb0da89dbf3da3879b9d683f.idx
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/objects/pack/pack-8e20be7d11be78befb0da89dbf3da3879b9d683f.pack
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/objects/pack/pack-8e20be7d11be78befb0da89dbf3da3879b9d683f.rev
A  .cache/shards/github.com/crystal-lang/json_mapping.cr.git/packed-refs
AD .cache/shards/github.com/grantbirki/octokit.cr.git/HEAD
AD .cache/shards/github.com/grantbirki/octokit.cr.git/config
AD .cache/shards/github.com/grantbirki/octokit.cr.git/hooks/post-rewrite
AD .cache/shards/github.com/grantbirki/octokit.cr.git/hooks/pre-rebase
AD .cache/shards/github.com/grantbirki/octokit.cr.git/objects/pack/pack-95f1b08886858f2fdc457e5da3090ba3799df9f2.idx
AD .cache/shards/github.com/grantbirki/octokit.cr.git/objects/pack/pack-95f1b08886858f2fdc457e5da3090ba3799df9f2.pack
AD .cache/shards/github.com/grantbirki/octokit.cr.git/objects/pack/pack-95f1b08886858f2fdc457e5da3090ba3799df9f2.rev
AD .cache/shards/github.com/grantbirki/octokit.cr.git/packed-refs
A  .cache/shards/github.com/icyleaf/halite.git/FETCH_HEAD
A  .cache/shards/github.com/icyleaf/halite.git/HEAD
A  .cache/shards/github.com/icyleaf/halite.git/config
A  .cache/shards/github.com/icyleaf/halite.git/hooks/post-rewrite
A  .cache/shards/github.com/icyleaf/halite.git/hooks/pre-rebase
A  .cache/shards/github.com/icyleaf/halite.git/objects/pack/pack-6ebf1f8efc251b8840037e7b890715fb41e64f6e.idx
A  .cache/shards/github.com/icyleaf/halite.git/objects/pack/pack-6ebf1f8efc251b8840037e7b890715fb41e64f6e.pack
A  .cache/shards/github.com/icyleaf/halite.git/objects/pack/pack-6ebf1f8efc251b8840037e7b890715fb41e64f6e.rev
A  .cache/shards/github.com/icyleaf/halite.git/packed-refs

The repos for octokit.cr.git and ameba.git have vanished for some reason.

And yes running shards with --verbose flag shows this is executing somewhere:

Resolving dependencies
git ls-remote --get-url origin
git ls-remote --get-url origin
rm -rf /app/.cache/shards/github.com/grantbirki/octokit.cr.git'
rm -rf /app/.cache/shards/github.com/crystal-ameba/ameba.git'
Locked version 0.1.0+git.commit.eb37b8129dbcf3638550cf8fce705aa9533598fd for octokit was not found in git: https://github.com/grantbirki/octokit.cr.git.

Well done, shards. First delete the repos and than complain that they don't exist 🤦

straight-shoota commented 2 months ago

Looks like GitResolver#origin_url is broken. It returns the remote URL of the root repo (https://github.com/GrantBirki/crystal-base-template), instead of the origin of the dependency.

GrantBirki commented 2 months ago

Ah ha! This is great news! That is totally the issue I bet since the true git repos are being nuked then. I see the exact same thing locally and in CI with the --verbose flag.

Great find! :heart:

So what do we do now to fix this issue?

crystal-lang / shards

How can I vendor dependencies (shards)? #611