derrickchoi / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

If using a local file cache, it can fill up the disk #159

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I personally do not use the local file cache. My main use model of s3fs is for 
off-site storage rather than using it as a dynamic nfs-like file system.

If one chooses the latter use model, then the local file cache can be a 
performance win. The downside is that it is possible to end up with a "mirror" 
of your bucket on the local file system.  Dependent upon the usage, the cache 
directory can become quite large or even fill up the local file system.

There are many ways to mitigate this: quota, a cron job, ... I actually keep my 
/tmp directory on a different virtual partition, so filling it up doesn't bring 
down the system, but not all users are so meticulous.

I can hear it now, "s3fs crashed my system by filling up the disk!"  My 
response to that is "we only provided you with the rope, you hung yourself with 
it".

Are things fine the way they are? ...or should s3fs implement something to help 
users not hurt themselves with using the non-default setting of use_cache?

Original issue reported on code.google.com by dmoore4...@gmail.com on 12 Feb 2011 at 5:11

GoogleCodeExporter commented 8 years ago
I'm with you on the "we only provided you with the rope, you hung yourself with 
it". I think things are fine the way they are. I can see how a user could fill 
up the /tmp directory (especially when dealing with elastic storage), then 
again, if I `cat /dev/urandom > /tmp/ouch` it'll also fill up the disk without 
warning.

Original comment by ben.lema...@gmail.com on 16 Feb 2011 at 10:15

GoogleCodeExporter commented 8 years ago

Original comment by dmoore4...@gmail.com on 26 Feb 2011 at 6:55

GoogleCodeExporter commented 8 years ago
I'd like to have the capability to set a limit to the cache from inside the 
program, as s3ql does for example. My vps doesn't have a separate /tmp, and the 
way I use s3fs I'd benefit a lot in bandwidth savings from having a 200mb cache 
for example, for my 5gb bucket. Due to the nature of the files I host, certain 
few files get accessed in too often at any time, while most of them are not.

If this is not going to be implemented, at least someone could provide 
instructions on how to use an external system to do the job - a suggestion 'use 
quotas' is not enough for most I imagine. But even then, there are other 
problems: Say quotas are used, and I have a folder that can go up to 200mb. 
What happens if it gets full, how will s3fs react? will it stop using it for 
later files and serve them directly? Will it rotate files, remove older ones, 
if it gets error messages by the OS that it can't save to the folder?

The periodic cleanup solution is easier to implement, but it's not the optimal 
one, if for example the files that get accessed a lot are large (and thus 
should remain in the cache), and the script deletes them every now and then. 

Another idea is a script that checks the cache directory and if it's near (or 
beyond) a certain size limit, will delete the X 'older' files whose sizes sum 
up to a certain percent of the total cache size.. But such a script is not the 
easiest thing to write I image, and if effort goes into that, it might as well 
get implement as a feature in  s3fs anyway.

Original comment by johnx...@gmail.com on 26 Jun 2011 at 2:01

GoogleCodeExporter commented 8 years ago
it would be nice for simplicity sake. adding extra steps for creating a virtual 
partition can suck for some users. not a necessity but a great option.

btw, what does happen when it gets full? say there's a 50 gig virtual partition 
and you have 100 gig in data on s3... does it error out or loop trying to 
resync?

Original comment by rgilbert...@gmail.com on 13 Jun 2012 at 6:00