buchgr / bazel-remote

A remote cache for Bazel
https://bazel.build
Apache License 2.0
576 stars 150 forks source link

[FeatureRequest] not save too large blobs downloaded from remote server #729

Closed gdh1995 closed 5 months ago

gdh1995 commented 6 months ago

Thanks for your work! Here's a small idea about how bazel-remote works as a proxy.

Background

Expect

So, I want bazel-remote to download a matched large blob but not save it into the local cache space (under --dir)

I've noticed --max_proxy_blob_size but I don't want make it small - on the contrary, I prefer to keep it as large as possible.

mostynb commented 5 months ago

Hi, I think this proposed feature would require significant modifications to some relatively complex parts of bazel-remote's code, and since the use-case is very specific I am not sure that it's worth the risk. Could you avoid this problem by using a larger disk cache size?

gdh1995 commented 5 months ago

Um I have to write code in a few different docker environments, so I have run up to 4 bazel-remote processes to isolate different versions of compiling cache. Then I prefer a "limited" max-cache-size (10GB now) for every bazel-remote cache folder.

I'm curious about why you think this feature requires "significant modifications"? A tiny change may be enough if only:

  1. when the configuration item is provided, just not save any larger files into the cache folder
  2. instead of deleting old large files actively, we may just wait the GC algorithm (LRU?) to abort them
  3. if a large blob on local cache is matched, still return it and update its last-access-time
ulrfa commented 5 months ago

I think it would require "significant modifications", because the current proxy implementation propagates files by first storing them to disk, not by streaming/buffering them in memory.

I have run up to 4 bazel-remote processes to isolate different versions of compiling cache.

Is the purpose of the isolation to avoid getting incorrect cache hits? If yes, have you considered other ways to ensure that? Preferably by expressing complete dependencies on the bazel side, but if that is not feasible you could consider isolation by using bazel-remote's --enable_ac_key_instance_mangling

gdh1995 commented 5 months ago

Sorry I forgot to mention my idea:

And thanks for the enable_ac_key_instance_mangling parameter. However, unfortunately my office uses different proxy backends (--http_proxy.url) to store the 4 isolated caches, so 4 system processes is still necessary for me :(

mostynb commented 5 months ago

I think it would require "significant modifications", because the current proxy implementation propagates files by first storing them to disk, not by streaming/buffering them in memory.

Correct. This is complicated code that I do not want to modify often.

I also think that the configuration options required for this feature to work well would also be difficult to describe. I think this feature might best be kept in a local fork, if it is important to you.

gdh1995 commented 5 months ago

Sorry but I know little about GoLang so I haven't understanded where this process (" implementation propagates files by first storing them to disk, not by streaming/buffering them in memory") occurs. If so I'll have to wait a few months - too busy in my work to learn other things :(

mostynb commented 5 months ago

If you do get around to this, feel free to share a link here. I can't promise that it will be accepted, but you never know.

I will close this feature request in the meantime.