Drillster / drone-volume-cache

Drone plugin for caching to a locally mounted volume
MIT License
109 stars 42 forks source link

cache very slow with much files #11

Closed janstuemmel closed 3 years ago

janstuemmel commented 7 years ago

restoring cache (means copying files from a to b) is very slow when you have much files, in my setup its takes almost 5 minutes. what about tar.gz'ing files and copy and untar?

on my (very slow) testing server:

mjwwit commented 7 years ago

This sounds like a very good idea! Can you run some more tests with cache rebuilding as well? Maybe we can even store the cache as archives. I'd really like to know how rsync handles diffing with tar files (when you're rebuilding the cache).

janstuemmel commented 7 years ago

i had a discussion with the drone team on gitter. there is a drone caching library to write caching plugins (https://github.com/drone/drone-cache-lib), and there are already some implementations where the cache is stored on sftp or s3, its better for mutible test runners on different servers.

But for my tiny projects i just needed a simple caching plugin, so i wrote my own, inspired by drone-volume-cache: https://github.com/janstuemmel/drone-cache. its simpler than this one, but archives the cache as tar.gz.

I'd really like to know how rsync handles diffing with tar files (when you're rebuilding the cache).

you could use the tar -df archive.tar.gz folder command to generate the diff, it failure exits if the tar folder structure is different from the directory

mjwwit commented 7 years ago

This plugin is intended for these simple "single agent" situations. I'm really curious as to how the archiving performs when rebuilding cache. Is it as fast (or maybe even faster) than rebuilding the expanded folder using rsync? If it's as fast or faster during the rebuilding phase I'm willing to implement the use of archive caching for this plugin so you can enjoy the best of both worlds.

janstuemmel commented 7 years ago

i ran some some tests where my node_modules folder got cached by drone-volume-cache using rsync and took about 5 min to restore, with tar.gz'ing the files, it takes now about 1,5min. its a very slow single core machine.

i can write a testscript for you today, if you want

mjwwit commented 7 years ago

I read the restoring phase will really benefit from the archived cache. I'm curious how the rebuilding phase is impacted though.

janstuemmel commented 7 years ago

iam going down from 7min build phase to 2min including restoring,testing,rebuilding

mjwwit commented 7 years ago

That's a huge difference! I'll start working on the archiving feature asap to see if I can replicate your results.

mjwwit commented 6 years ago

I've been trying to get this working, but I'm having trouble combining it with my cache expiry feature. It may be harder to implement this than I initially thought. Because of this it will take me considerably longer to build this feature. If you're up to it it'd be awesome to get a PR for this.

janstuemmel commented 6 years ago

whats the problem exactly? i think just checking the tar.gz file date is enough?

mjwwit commented 6 years ago

The cache TTL is checked for each file/folder in the cache, not the entire cache. It wouldn't be as efficient when checking a tar archive. A new version of the archive will be created on each cache rebuild, which is then rsync'ed to the mounted volume. If any file within the cache changed, the entire archive would have changed, resetting the TTL. The only time the expiry feature would work is if nothing in the cache changes for the entire duration of the TTL.

merajnouredini commented 5 years ago

We have the same issue here, is there any updates?

mjwwit commented 5 years ago

If you need this fast you can use a custom post-restore and pre-rebuild step where you extract and archive your cache. This will however mess with your cache expiration. I'm not experiencing this issue myself, which is why it's not on my priority list to build. I'd love to get a PR for this though!

merajnouredini commented 5 years ago

Thanks for your response, I understand, we have this issue only in our angularjs grunt builds and in other cases, the plugin works well. I'll give it a try to see what can I do.

mvdan commented 5 years ago

I'd suggest using https://github.com/drone-plugins/drone-volume-cache/ instead. It uses tar archives, so it only transfers a single file back and forth. The overhead of lots of tiny files is much smaller in that plugin.

It's not very well documented, but it works in a very similar way to this plugin.

merajnouredini commented 5 years ago

@mvdan I tried your suggestion, but the cache step is still too slow, is there any specific configuration that I have missed?

mvdan commented 5 years ago

In our case, restoring and saving a cache containing Go and JS stuff went from ~15s to ~5s. That's directly storing it in the host's disk, so the transfer speed is fast. Of course, you'll never get the cache plugin to be very fast if there are a lot of files to cache, or if the total size is large. If you want to dig further into it, you can always debug and profile.