Open GrantBirki opened 5 months ago
It should be as simple as commiting ./lib
into your repository. All dependencies are checked out into worktrees inside ./lib
.
By default, this folder is excluded via .gitignore
. But you're free to change that. Keeping vendored dependencies available seems like a good use case for doing this 👍
@straight-shoota are there any potential downsides to doing this? I know the ./lib
folder can be quite large as it looks like it contains entire repositories for me (in a few projects).
I don't see any downsides. The lib folder should only contain checked-out worktrees, not the entire repo history. There's no .git
folder.
If you see something else, that would be odd. More details, please =)
Thanks! That should be enough info for me to use for vendoring. I'll re-open this issue if I have any troubles.
👋 Hey again @straight-shoota! I had an additional question around vendoring dependencies.
So to truly "vendor" a dependency, it would mean that everything related to that dependency lives within the repo.
For example, if I had a shard
that made API requests to GitHub and it was a runtime dependency of my app, I would want to ensure that it is always available and no network calls have to be made at all to pull down, build, or compile that shard
.
So my question is, in addition to vendoring (checking into git) the ./lib
folder, I'm wondering if it is also safe to check in the shards cache path within my project directory as well.
Example: SHARDS_CACHE_PATH="$DIR/.cache/shards"
where $DIR
is my project directory.
If I can safely check in the .cache/shards/
dir, then I can run shards install --frozen --local
(with the local flag) and no network requests are made for my dependencies (yay!).
Is this a safe thing to do though (I'm still new to crystal )? I do my development on linux locally and my target systems that I deploy to are also linux in production. Would this potentially cause any conflict if I had a team member that was doing development on a MacOS machine?
I want to effectively remove all occurrences of:
Fetching https://github.com/kemalcr/kemal.git
Fetching https://github.com/stefanwille/crystal-redis.git
Fetching https://github.com/crystal-ameba/ameba.git
Fetching https://github.com/luislavena/radix.git
Fetching https://github.com/crystal-loot/exception_page.git
Fetching https://github.com/sija/backtracer.cr.git
Fetching https://github.com/ysbaddaden/pool.git
...
and instead have them all be:
Using radix (0.4.1)
Using backtracer (1.2.2)
Using exception_page (0.4.1)
Using kemal (1.5.0)
Using pool (0.2.4)
Using redis (2.9.1)
Using ameba (1.6.1)
Dependencies are satisfied
...
when running shards install --local
My overall goal is to have builds be extremely repeatable, reliable, not rely on the network, not fail if a rogue shard maintainer deletes their repo, and work well in CI systems across multiple platforms.
Thanks! 🙇
Yah I think this should technically be fine. The shards cache just contains bare repositories which should be Cross-Platform compatible. This of course depends on the resolver (such as git). All current resolvers should work that way.
I don't see the point of this though. Why do you want to run a cached shards install
when you can just have the dependencies already checked out in lib
?
Without network requests you won't get anything new anyway.
The only reason I could think of where this might perhaps be useful ist to seamlessly switch between different dependency versions during development.
I don't see the point of this though. Why do you want to run a cached shards install when you can just have the dependencies already checked out in lib?
Even when I have everything in lib/
checked into version control, running shards install
still indicates its making network requests with Fetching https://github.com/kemalcr/kemal.git
. So its clear that in order to have completely repeatable builds that don't rely on the network at all, that you need to have all three conditions met (AFAIK):
lib/
dir).cache/shards/
dir)shards install --frozen --local
Does this all make sense and would it be fairly safe to do @straight-shoota? Thanks again for helping out a new user here! 🙇
I don't get why you even want to run shards install
in the first place. How about just not doing that? You already have your dependencies installed in lib/
and explicitly not intention to change anything about that. So why go through all of that just to get effectively a no-op?
I figure it should be fine to do, but I don't see any sense in it.
I don't get why you even want to run shards install in the first place. How about just not doing that?
Locally, (on my dev box) this would be fine. But in CI systems (like GitHub Actions) I'm not committing my binaries so they need to be built during a deployment or for testing.
For example, during my lint
CI job, I would need the ameba
binary to exist and the only way to make that exist without committing binaries would be to run shards install
and then have ./bin/ameba
ready to go for my CI job responsible for linting.
Vendoring is what I want here, because if the maintainers of ameba
were to just delete the project, I wouldn't be able to run CI jobs for linting. Yeah, that isn't too bad but if the same were to happen for kemal
, then I would lose the ability to build or deploy my service (very bad).
So I'm trying to iron out the best way to fully bootstrap, install dependencies, test, lint, and then deploy my service with zero reliance on the network or 3rd party shards existing. If I vendor a dependency even a single time, I expect it to be usable in my project forever, regardless if the source repo even exists or not.
This is following the pattern of "who owns my availability?" and the answer is always you
.
I'm trying to figure out how to build extremely reliable and fault tolerant systems and I feel so close with crystal but not fully there yet.
If your primary concern around all this is the small chance your dependencies are deleted upstream, might just be easier to fork them and update your shard.yml
to your own forks. Then you have a copy of the code regardless of what happens upstream, and can occasionally sync the mirror to get latest changes and such.
For reference, here is the repository that I am working on which will be a "base" repo template for many of my projects going forward and hopefully for others to use as well with crystal
If your primary concern around all this is the small chance your dependencies are deleted upstream, might just be easier to fork them and update your shard.yml to your own forks
Yep that is absolutely an option. I know many large organizations do this anyways with mission critical dependencies to avoid them bringing down their build systems. However, that requires a lot of maintenance and certainly isn't for everyone.
What I was ultimately hoping for with crystal was the phenomenal dependency management that comes with Ruby. I mean its just outstanding and works so well. You can vendor all of your Ruby Gems into the vendor/cache
dir, and each dependency is just a fancy zip
file that is all packed up. Running bundle update
updates it for you and you can commit them with ease. New users simply just run script/bootstrap
and all their gems are installed on a per project basis. The only time you ever reach out over the network is when doing bundle update
to pull down new versions from RubyGems.
In my mind, this is the absolute dream of dependency management and I am really hoping I can nail down something similar here with crystal. ⭐
I understand the reasoning:
Committing Shards' cache means committing a git mirror of each dependency, some may not even be needed anymore (oops).
Committing the lib
folder means that we must filter out anything non portable manually (libraries, executables). Even if they have a proper .gitignore
we may not use Git but Mercurial. It also means that we must check which dependency build something to write a bootstrap script, then maintain that script on each update... something that could be automated.
I suppose there could be a pair of commands (or one command with two subcommands) that would:
I was not asking for reasons for vendoring dependencies. I get that you want that.
I'm wondering why you need to run shards install
when you already have the dependencies available and don't actually have to install them.
If I understand correctly, you rely on the postinstall hook of shards install
to build ameba
(and possibly other binaries?) for you.
That's seems to be the only thing you need, not the full shards install
.
It has been proposed to have a separate command to trigger only the postinstall action. But this entire hook situation isn't entirely satisfactory and is under discussion. So I wouldn't expect a quick implementation of such a command when the bigger picture isn't entirely clear.
However, it's also pretty easy to explicitly build the binaries you need.
For ameba, you can run this command in the root repository: make -C lib/ameba build BUILD_TARGET=$(pwd)/bin/ameba
Alternatively, it should be pretty straightforward to implement the behaviour of triggering the postinstall hooks as a script outside of shards
itself.
Something like this should do (I haven't tested it, but it should explain the necessary steps).
#! /bin/bash
O=${O:-"$(pwd)/bin"}
for spec in lib/*/shard.yml; do
cd $(dirname $spec)
eval $(yq '.scripts.postinstall' shard.yml || continue)
yq -e '.executables[]' shard.yml | while read -r line; do ln -s bin/$line $O; done
done
@ysbaddaden shards install --skip-postinstall --skip-executables
should check out the raw repository contents which means a pristine state that you can check in with no system-specific artifacts. Any changes afterwards to these paths could be ignored and cleaned (until the next time you update the dependencies).
Pro: that would just work today (nice).
Con: Upgrading dependencies becomes tedious. You can't just upgrade, try things out, and commit when it's working. You must destroy your lib folder, reinstall the shards with the above command, commit, then rebuild everything, update your gitignore if needed, ...
You also have to copy (& maintain) the script to each and every project where you want to vendor your dependencies. Shards doesn't do it (yet?) but postinstall scripts may have to be built in a specific order, which a script won't be able to notice.
Prior art
-no-vendor
and -vendor-only
: https://pkg.go.dev/github.com/golang/dep/cmd/depThis seems like it might be overcomplicating. If the dependencies have proper ignore patterns setup and maybe a recipe to clean up the build directory, upgrading should be pretty simple. There might be some bumps when using different version control systems, but that shouldn't be too much of a problem. The ignore file syntax is usally pretty similar and it's just a matter of making the patterns accessible. Many IDEs are perfectly capable of understanding different ignore file formats. So I think this should be easy to manage.
I wouldn't worry too much about postinstalls. Their purpose can easily be factored out of the shards install process. IMO it's a better approach to build dependencies in your build system anyway.
@straight-shoota To me what you're proposing is overcomplicated solution to a problem that other PMs are already handling since yrs back... I cannot imagine telling people that they have to cook up a Makefile recipe along with the custom process just to vendor the dependencies.
I'm wondering why you need to run shards install when you already have the dependencies available and don't actually have to install them.
If I understand correctly, you rely on the postinstall hook of shards install to build ameba (and possibly other binaries?) for you. That's seems to be the only thing you need, not the full shards install
Yep so in the perfect work, I wouldn't really need to run shards install
but... I need things to work with my project like ameba
or in the future other binaries that get built. In my crystal-base-template
project, I have a helper script called script/bootstrap
which is intended to be run by users when they first clone the repo. This script sets up everything they would need to start contributing to the project. During this script, I run shards install with some custom flags. Just after doing a shards install, the script calls out to another script that I wrote called script/postinstall
which does the exact logic you were talking about where I trigger some "postinstall hook logic outside of shards
itself" as seen here. Currently, it only does this for ameba
as I'm new with crystal and haven't needed this postinstall hook logic for any other dependencies outside of the basic linter for crystal.
The thing that I am still struggling with the most is that even when doing hacky things, writing scripts, setting custom shards env vars, etc... I still cannot get shards
to behave in a way that doesn't make requests over the network. In short, I haven't figured out a way yet at all to make shards/crystal completely isolated from network calls in a truly vendored fashion.
Note the "fetching" line in the screenshot above
This behavior can be easily replicated by cloning my crystal-base-template repo and running script/bootstrap
I know many other languages have vendoring totally dialed in and what I'm catching here is that its either like 95% implemented in crystal/shards or I'm doing something slightly wrong
@straight-shoota I half agree with you, there shouldn't be much need to care about the postinstall
hook.
I wholeheartedly agree that the postinstall
hook is hardly portable (Windows :disappointed:), but maybe the interpreter will help in the future, or maybe we should advocate to call $CRYSTAL run
?
I believe Ameba shouldn't compile itself in a postinstall
hook. In fact I'm convinced it musn't be installed as a dependency: it should be distributed and installed just like any other external tool; just like we install crystal, shards, watchexec or nodejs.
But I'm not convinced for shards that bind an external library and either vendor the library (hardly packaged) or need a C translation layer (because C++), or generate lib definitions from C headers using libclang, ... IMO this is an internal detail to each shard, and that was the initial purpose for the postinstall
hook.
Pushing the responsibility to build something to users of this or that shard wouldn't be nice, and even worse for a nested dependency.
@Sija https://github.com/crystal-lang/shards/issues/611#issuecomment-2070112449
I cannot imagine telling people that they have to cook up a Makefile recipe along with the custom process just to vendor the dependencies.
No you shouldn't need to setup a build system just for this purpose. But you'll probably need it anyway (more likely with increasing level of project complexity) at which point it's no extra step because it's already there.
@ysbaddaden https://github.com/crystal-lang/shards/issues/611#issuecomment-2070264061
- But I'm not convinced for shards that bind an external library and either vendor the library (hardly packaged) or need a C translation layer (because C++), or generate lib definitions from C headers using libclang, ... IMO this is an internal detail to each shard, and that was the initial purpose for the
postinstall
hook.
In an ideal world, maybe. But if you build C libraries, it's quite likely you'll need a way to configure them, and make this available as part of your own build step. Maybe simple shim libs or the like can get away with a basic default configuration.
But as soon as you need to derive the smallest bit from the default, an implicit build step based on postinstall
hook isn't going to cut it (unless shards
gets blown up to handle such config options).
If you use a build system with explicit build dependencies however, this is a relatively easy task.
@straight-shoota this might be asking a lot of you and if you just don't have the time I totally understand.
I'm wondering if it would be possible for you to check out this repo -> https://github.com/GrantBirki/crystal-base-template and either open a PR demonstrating how I could tweak something to vendor shards, or comment on what I'm doing wrong here that is preventing a fully network isolated vendoring strategy.
Once I finally nail down a solution (if possible), I would be more than willing to open a PR some where to document this type of vendoring strategy in detail to help out other folks!
If larger organizations, enterprises, or critical services are looking to adopt crystal, having a rock-solid vendoring strategy is 100% going to be a requirement. I would be happy to help out with making sure this is a paved path for the next person to come along 🙇
@GrantBirki This template doesn't have many dependencies, with this scope it's quite simple.
As @ysbaddaden mentioned in https://github.com/crystal-lang/shards/issues/611#issuecomment-2070264061, ameba
is a development tool, not a library. It should be installed as a tool in the development environment and not as a source dependency via shards
.
With this change, shards
has no dependencies to install at all, so the source code is naturally 100% self contained.
Of course this is only a blank template. If you want to make any use of it, you'll need some shard dependencies and at some point they might need some platform-specific build artifacts. What exactly to do about them depends on the individual circumstances. But I think a good general strategy looks like this:
shards install
and shards update
always with --skip-postinstall --skip-executables
to prevent hooks from running. This ensures that shards
only puts the raw sources.make -C lib/foobar libfoo
). If they don't, maybe you can help with that (at least make a request).lib/
clean. (Out of tree for the dependencies, it can be in tree for the main project)@straight-shoota I have added a simple dependency and I'm now working on trying to get vendoring working properly. I still can't quite figure out where "cached shards" are installed. The command that I am running (script/update
) sets the cache dir to .cache/shards/
in my project directory. However, it looks like those files are all just git metadata?
So when I run a subsequent script/bootstrap
, it works on my local machine but not in my build system (GitHub Actions)
.cache/shards
doesn't seem to be checked into the repository, thus the CI runner cannot find it.
However for the setup which I'm suggesting, you don't need to vendor the entire shards cache. Just ./lib
.
I understand the dependency you added and all of its transitive dependencies are pure Crystal sources. They produce no platform-specific artifacts. So no additional steps are necessary.
You can simply install them via shards install
and commit ./lib
. No need to chache the shard repositories, or handle any postinstall or cleanup tasks.
So if I don't commit the .cache/shards
dir, my CI system fails with:
$ script/bootstrap
I: Resolving dependencies
E: Locked version 0.1.0+git.commit.eb37b8129dbcf3638[5](https://github.com/GrantBirki/crystal-base-template/actions/runs/8807823800/job/24175736167#step:5:6)50cf8fce705aa9533598fd for octokit was not found in git: https://github.com/grantbirki/octokit.cr.git.
But even if I do commit the .cache/shards
dir, it still fails with the same error.
How in the heck do I get my CI system to just run script/bootstrap
without needing the network?
reference PR where I'm working: https://github.com/GrantBirki/crystal-base-template/pull/2
I feel like I'm going nuts lol :joy:
So in CI it fails with or without the SHARDS_CACHE_PATH
being committed. In both cases, I have lib/
committed.
So there has got to be a third component somewhere. If I vendor SHARDS_CACHE_PATH
and lib/
it should have everything that CI needs to run my script/bootstrap
script which is really just running shards install --skip-postinstall --skip-executables --local --frozen
under the hood.
Perhaps there is another hidden directory somewhere that shards install
looks at which I am missing? Maybe there are some files or git information cached somewhere else? Maybe these items are in a difference place for me because I'm using crenv
?
Hm, yeah I tested this locally. There seems to be something going wrong in shards install
. It appears to be removing some of the cached repositories 😕
I staged .cache/shards
, then ran script/bootstrape
and git status
shows this:
AD .cache/shards/github.com/crystal-ameba/ameba.git/HEAD
AD .cache/shards/github.com/crystal-ameba/ameba.git/config
AD .cache/shards/github.com/crystal-ameba/ameba.git/hooks/post-rewrite
AD .cache/shards/github.com/crystal-ameba/ameba.git/hooks/pre-rebase
AD .cache/shards/github.com/crystal-ameba/ameba.git/objects/pack/pack-b8c5dc20a55568868038dacf844dffa567e01f9c.idx
AD .cache/shards/github.com/crystal-ameba/ameba.git/objects/pack/pack-b8c5dc20a55568868038dacf844dffa567e01f9c.pack
AD .cache/shards/github.com/crystal-ameba/ameba.git/objects/pack/pack-b8c5dc20a55568868038dacf844dffa567e01f9c.rev
AD .cache/shards/github.com/crystal-ameba/ameba.git/packed-refs
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/FETCH_HEAD
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/HEAD
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/config
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/hooks/post-rewrite
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/hooks/pre-rebase
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/objects/pack/pack-8e20be7d11be78befb0da89dbf3da3879b9d683f.idx
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/objects/pack/pack-8e20be7d11be78befb0da89dbf3da3879b9d683f.pack
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/objects/pack/pack-8e20be7d11be78befb0da89dbf3da3879b9d683f.rev
A .cache/shards/github.com/crystal-lang/json_mapping.cr.git/packed-refs
AD .cache/shards/github.com/grantbirki/octokit.cr.git/HEAD
AD .cache/shards/github.com/grantbirki/octokit.cr.git/config
AD .cache/shards/github.com/grantbirki/octokit.cr.git/hooks/post-rewrite
AD .cache/shards/github.com/grantbirki/octokit.cr.git/hooks/pre-rebase
AD .cache/shards/github.com/grantbirki/octokit.cr.git/objects/pack/pack-95f1b08886858f2fdc457e5da3090ba3799df9f2.idx
AD .cache/shards/github.com/grantbirki/octokit.cr.git/objects/pack/pack-95f1b08886858f2fdc457e5da3090ba3799df9f2.pack
AD .cache/shards/github.com/grantbirki/octokit.cr.git/objects/pack/pack-95f1b08886858f2fdc457e5da3090ba3799df9f2.rev
AD .cache/shards/github.com/grantbirki/octokit.cr.git/packed-refs
A .cache/shards/github.com/icyleaf/halite.git/FETCH_HEAD
A .cache/shards/github.com/icyleaf/halite.git/HEAD
A .cache/shards/github.com/icyleaf/halite.git/config
A .cache/shards/github.com/icyleaf/halite.git/hooks/post-rewrite
A .cache/shards/github.com/icyleaf/halite.git/hooks/pre-rebase
A .cache/shards/github.com/icyleaf/halite.git/objects/pack/pack-6ebf1f8efc251b8840037e7b890715fb41e64f6e.idx
A .cache/shards/github.com/icyleaf/halite.git/objects/pack/pack-6ebf1f8efc251b8840037e7b890715fb41e64f6e.pack
A .cache/shards/github.com/icyleaf/halite.git/objects/pack/pack-6ebf1f8efc251b8840037e7b890715fb41e64f6e.rev
A .cache/shards/github.com/icyleaf/halite.git/packed-refs
The repos for octokit.cr.git
and ameba.git
have vanished for some reason.
And yes running shards
with --verbose
flag shows this is executing somewhere:
Resolving dependencies
git ls-remote --get-url origin
git ls-remote --get-url origin
rm -rf /app/.cache/shards/github.com/grantbirki/octokit.cr.git'
rm -rf /app/.cache/shards/github.com/crystal-ameba/ameba.git'
Locked version 0.1.0+git.commit.eb37b8129dbcf3638550cf8fce705aa9533598fd for octokit was not found in git: https://github.com/grantbirki/octokit.cr.git.
Well done, shards. First delete the repos and than complain that they don't exist 🤦
Looks like GitResolver#origin_url
is broken. It returns the remote URL of the root repo (https://github.com/GrantBirki/crystal-base-template), instead of the origin of the dependency.
Ah ha! This is great news! That is totally the issue I bet since the true git repos are being nuked then. I see the exact same thing locally and in CI with the --verbose
flag.
Great find! :heart:
So what do we do now to fix this issue?
A common design pattern for production applications is the ability to "vendor" dependencies. This means committing them to version control (git). You cannot have truly reliable builds without vendoring dependencies because a GitHub repository could be deleted at any point in time.
If a repo is deleted that contains a shard, then I can no longer pull that dependency to build an executable and thus my entire build process has fallen apart.
How can
shards
be used to vendor and check-in all dependencies?I know that once could do something like this:
but this doesn't do exactly what I'm after. It does a great job at caching locally, but when I run this in my GitHub Actions CI flow, it breaks.
I come from the land of Ruby where vendoring dependencies (Gem) is quite common and it is done so with ease through
bundler
. Here is a dead simple Ruby project that is open source where I vendor my dependencies with bundler.I'm hoping the same can be accomplished with Crystal and Shards! 🙇