Open matthewtiscareno opened 1 year ago
I'm aware of the problem with rsync displaying directories. I tried a variety of ways to suppress them without also suppressing directories that really did change (e.g. added or deleted), but I didn't find anything that was both accurate and efficient.
Yes, I think the problem is that jekyll regenerates all the files each time, thus destroying the timestamps.
The one solution to both of these problems is to use the --checksum
option to rsync, which ignores timestamps and uses file checksums instead. This does work as expected, but it's also much slower due to the large amount of non-jekyll files present in our directory structure.
I think the right solution is to wait until Debby separates out the jekyll from the non-jekyll files and then switch to using the --checksum
option. It will be much faster because it has much less data to compare. The non-jekyll files can be synced using their timestamps since they aren't constantly regenerated.
BTW, this is why I chose to use diff
for the initial comparison instead of rsync -n
. I could make the diff
output clean because it only compared the file contents. It's fast because it's operating on the local disk - comparing the jekyll-generated _site
directory to the current /var/www/documents
. The rsync
operations with --checksum
are slow because they all go to the RAID, which is vastly slower to read from. Only checking the timestamps is a much faster operation, which is why that's what we do right now.
Is there no way to avoid having Jekyll regenerate every file?
It turns out the --incremental
option works with static builds as well as while running the jekyll server. I updated the deploy script to use it when building the website. With this modification, the rsync to staging and the rsync to the RAID is clean (no extraneous directories or files). The rsync from the RAID to the production servers still shows a few files, mainly in the roses
directory. I don't know why. But it's a lot nicer than it was before. We can leave this issue open and I'll look into it some other time.
Great!!!
When I run one of the scripts in the
deploy
directory, it first asks me to approve the results of adiff
command that consist only of the files that are actually going to be changed, which is good. However, once I approve the change, the resultingrsync
command lists every directory in the file structure as well as the files that are changed. Furthermore, the final tworsync
commands list not only every directory but every file in the file structure.Perhaps this is happening because Jekyll has regenerated all files and/or has reset timestamps on all directories, rather than only changing those files whose source code has changed?