epi-project / brane

Programmable Orchestration of Applications and Networking
Apache License 2.0
9 stars 7 forks source link

fix(ctl): Add CACHEDIR.TAG to target dir on create #95

Open DanielVoogsgerd opened 5 days ago

DanielVoogsgerd commented 5 days ago

My backups have quadrupled in size, because if you download an image using branectl before you ever compile anything in brane, cargo will never annotate its target directory with a CACHEDIR.TAG (and rightfully so). However, this does mean that most backup solutions will gladly backup all your nice debug builds. :smiling_face_with_tear:.

Lets fix that, we do add a cachetag to the download directory itself if the target directory is not the one usually used by cargo.

Lut99 commented 4 days ago

So, I agree with this on principle :)

However, you actually placed this in the wrong file (or at least, a less efficient file). Where you placed it is actually only called when executing branectl download ..., not when calling make.py.

I've added an implemention in make.py and pushed it to your branch! So feel free to check it out. I am putting it in target/make_cache, though, instead of target, because the script shouldn't assume its output directory is always nested. But if I've written it correctly (knocks on wood), anything the script does that does not involve calling cargo will not escape that directory and should therefore not cause more to be backup'ed than necessary. But let me know if that's true.

Also, I did end up removing your lines, because it seemed like you were missing a dependency (cachedir?) and because it's not guaranteed that wherever the user wants to download the images, it's supposed to be a cachedir. So I'd be hesitant to make this decision for the user.

Let me know what you think! And if you like it, I'll merge.

Lut99 commented 4 days ago

By the way ~ since we don't generate CHANGELOG.md automatically, can you add updates to it in the PRs you make? To keep it in sync :)

(I'll take care of merging it, no worries)

DanielVoogsgerd commented 4 days ago

Okay, this is a little more complicated and annoying I'm afraid. The problem is not really that the images might not be marked as artifacts. The problem is that out of safety, cargo will not place its CACHEDIR.TAG if the target directory already exists, as this could have quite severe consequences if the target directory is set so some important directory, e.g. ~/.

I've added an implemention in make.py and pushed it to your branch! So feel free to check it out. I am putting it in target/make_cache, though, instead of target, because the script shouldn't assume its output directory is always nested. But if I've written it correctly (knocks on wood), anything the script does that does not involve calling cargo will not escape that directory and should therefore not cause more to be backup'ed than necessary. But let me know if that's true.

This is where the thing about cargo starts be annoying, because the directory will be created by make.py, but only the make_cache directory will be marked as cache. So if I run this before I ever do a Cargo build, all cargo's artifacts will end up being backupped.

However, you actually placed this in the wrong file (or at least, a less efficient file). Where you placed it is actually only called when executing branectl download ..., not when calling make.py.

Ah that makes sense, however, since one could call both and both are capable of creating the target directory, I suggest we keep both. In general, I would wager to say that you can always pair code that could create the target directory

Also, I did end up removing your lines, because it seemed like you were missing a dependency (cachedir?)

Ah oops that is my bad, I could re-add it if you want.

[...] and because it's not guaranteed that wherever the user wants to download the images, it's supposed to be a cachedir. So I'd be hesitant to make this decision for the user.

I think we kinda already are making this decision for the user, because if one first runs any cargo command that creates a target directory, that directory will be marked by Cargo itself. So a subsequent download which defaults to a subdirectory of it will be marked as well. I did try to be a bit vigilant with respect to this problem, which is why I check if the output path starts with target and only then I would add the cachedir tag to the parent directory. In the second commit I also add it for the image directory itself if it has a different path, I am fine dropping that part (basically the else clause), but you could also make the argument that putting a tag on it is warranted as they are just large downloaded artifacts that can be redownloaded quite easily.

DanielVoogsgerd commented 4 days ago

By the way ~ since we don't generate CHANGELOG.md automatically, can you add updates to it in the PRs you make? To keep it in sync :)

(I'll take care of merging it, no worries)

Ah sorry, yeah I really should, I totally forgot. I will try to add them for past changes as well.

Lut99 commented 4 days ago

Ah sorry, yeah I really should, I totally forgot. I will try to add them for past changes as well.

No worries! I already did a run for post changes (see develop... or main. I can't quite recall if I already switched my local repo at that point, oops). But you can double-check to see if anything's missing.

Now, to the issue at hand...

Okay, this is a little more complicated and annoying I'm afraid. The problem is not really that the images might not be marked as artifacts. The problem is that out of safety, cargo will not place its CACHEDIR.TAG if the target directory already exists, as this could have quite severe consequences if the target directory is set so some important directory, e.g. ~/.

Good point. I assumed Cargo would fix it but yeah, makes sense.

I think we kinda already are making this decision for the user, because if one first runs any cargo command that creates a target directory, that directory will be marked by Cargo itself. So a subsequent download which defaults to a subdirectory of it will be marked as well. I did try to be a bit vigilant with respect to this problem, which is why I check if the output path starts with target and only then I would add the cachedir tag to the parent directory. In the second commit I also add it for the image directory itself if it has a different path, I am fine dropping that part (basically the else clause), but you could also make the argument that putting a tag on it is warranted as they are just large downloaded artifacts that can be redownloaded quite easily.

I guess we're in the same bind as Cargo here. Either way, it is indeed unjustified to add CACHEDIR.TAG to existing directories, but maybe adding it to new ones is a good default. In that case, I don't think we need to give target special treatment.

We can probably generalize the implementation in both make.py and branectl sufficiently well by defining that we place it in the highest newly created folder when it creates folders. This way, when we tell make.py to use a cache folder target/make_cache and both are new, it will be placed in target; else, if target already exists, we place it in make_cache instead.

I think this will do for most use-cases. Thankfully, CACHEDIR.TAG is not hidden on most systems (he said, really hoping that's the case), so at worst the user might be surprised it's there but can figure out what's going on.

I'll give this another go!

Lut99 commented 4 days ago

OK, how about this?

DanielVoogsgerd commented 2 days ago

LGTM, :)

EDIT: You could change those URLs to https, if you want.