Open CHTJonas opened 3 years ago
This is a great idea! The most salient point for me is that as we grow and work on more things, we want to maintain a global style sheet and set of assets that would benefit from being in proper version control, which this solves. Let's give this a few more days for folks to comment, then I'll approve it.
I wrote written a very basic prototype in go last night, mostly as an academic exercise. It achieves what's listed above but could likely be improved further e.g. selectively caching the output of git show abbrev_commit:file_path
. It does have the disadvantage that it is an entirely custom application server which, whilst being very simple, will require some upkeep and maintenance going forwards. asset-server.zip
Instead, @doismellburning suggested in #hackday that the future git repo with our custom (non-vendored) assets in could contain a Makefile that runs something like cp -r * /public/societies/srcf-web/public_html/assets/$(git rev-parse --short HEAD)/
inside the repo. This is probably a much simpler and more accessible solution and could likely be extended in the future were we to start using something like Webpack to bundle our assets.
The drawback of those approaches is that your URLs change whenever you make a git commit, even if the file in question doesn't change. Maybe you could do some content-addressable storage thing and use a hash of the file instead? It's a bit trickier to keep all the hashes you care about in order, but hopefully it's still at the complexity of "bash script" rather than "custom application server".
(I can't help but also wonder if either of the downsides of the existing approach are really hurting in practice... are we being stung by caching behaviour? Is "changes immediately visible everywhere" actually bad?)
My thinking was that content would be preserved approximately forever (disk space is cheap) and so old URLs would 'never' expire or have their content removed. As a worked example, if srcf.js
is introduced in commit abc1230
and a child commit xyz7890
is made which doesn't affect srcf.js
then there would be two duplicate copies of it, for example https://assets.srcf.net/assets/abc1230/srcf.js
and https://assets.srcf.net/assets/xyz7890/srcf.js
. This is obviously non-ideal but not a deal-breaker IMO and could likely be resolved with some crafty use of rsync in hardlink/differential mode, or some bash scripting.
That all being said, I'm not at all opposed to using content-addressable storage. Are you envisaging something like the following?
#!/bin/bash
cd /path/to/asset/git/repo
STORE="/path/to/htdocs"
find -type f | while read FULL_PATH; do
DIR_NAME="$(dirname $FULL_PATH)"
BASE_NAME="$(basename $FULL_PATH)"
EXTENSION="${BASE_NAME##*.}"
DIR_PATH="$STORE/$DIR_NAME/$BASE_NAME"
HASH="$(sha256sum $FULL_PATH | awk '{ print $1 }')"
mkdir -p "$DIR_PATH"
cp "$FULL_PATH" "$DIR_PATH/$HASH.$EXTENSION"
done
Browser caching behavior is less important (although currently we don't send a Cache-Control
header for resources underneath https://www.srcf.net/_srcf/ so bets are somewhat off). On the other hand we have recently be bitten by changes to the main www site affecting the control panel adversely. As Timeout and LBT grow in complexity, and new projects spring up, I see more sharing of common assets being involved at which point this becomes more important.
Yeah, something like that (although I was thinking preserve the original filename and use the hash as a directory name). But you also might want a way to mass-update your hrefs and srcs when there's a new file that you want to opt into using.
Craftiness in rsync-hardlinks seems like also a good solution, but at that stage I don't know which approach involves least craft :)
(Of course I don't have a horse in this race per se, just suggesting ideas).
This might be relevant to the discussion on caching: https://www.stefanjudis.com/notes/say-goodbye-to-resource-caching-across-sites-and-domains/. In summary, Chrome and Safari will use the eTLD or other parts of a site's hostname together with the asset URL to determine the asset's cache key. That is to say that if www.facebook.com
sources https://code.jquery.com/jquery-3.5.1.slim.min.js
then that will be cached separately to www.srcf.net
sourcing the same file at the same URL.
Project/idea summary
Provide a way for SRCF web assets to be included in a versioned manner, preferably at unique URLs that can be cached infinitely. This will need to account for the fact that some assets will be custom and likely stored in one of the SRCF git repos while others will be vendored copies of well-known and open source libraries. Additionally some assets will be static files like images or video.
For example, the current SRCF stylesheet applied on top of Bootstrap might be found at
https://assets.srcf.net/:git_abbrev_commit/srcf-bs.css
while a vendored copy of jQuery might behttps://assets.srcf.net/vendor/jquery/3.4.1/jquery.min.js
.Motivation
Our current setup stores assets as files accessible using URLs underneath https://www.srcf.net/_srcf/. This works acceptably with the caveat that changes made to those files become immediately visible everywhere that includes them. Browser caching behaviour also comes into effect here as changes made to those files might take a while to 'go live' on users' browsers.
Whilst our existing method has served us well for a while, my opinion is that the lack of versioning will become increasingly annoying as more and more projects start to include the same set of shared assets.
Alternatives considered
A git repo containing the assets that is used as a submodule by any project that wants to include. This potentially has a major caching disadvantage for users' browsers since now each project is including assets locally rather than from a shared location. There are also storage implications to versioning large files in git.