Feh / nocache

minimize caching effects
BSD 2-Clause "Simplified" License
554 stars 53 forks source link

turn off the cache for a directory #28

Open beroal opened 8 years ago

beroal commented 8 years ago

This rather is a support request. Can I disable the cache for a specific directories? Benefit is as follows. A program transmits big files over a network and stores hosts' metadata in the metadata file. It's better not to disable cache for the metadata file.

Feh commented 8 years ago

A more generic solution would be to introduce a method to either include (“whitelist”) or exclude (“blacklist”) certain glob patterns. The problem here is that the only way you can configure nocache’s behavior is by specifying environment variables, so this approach would mean you have to: think of two good variable names, parse their contents into an array or a list and then match every open() call against each expression. Feel free to implement this if you need it; I’ll take a look at it, but I currently don’t have the time to do it myself.

beroal commented 8 years ago

The problem here is that the only way you can configure nocache’s behavior is by specifying environment variables

Why so? There are commands which accept options and a command, for example, "nice", "sudo", "env", "time", "xargs".

Feh commented 8 years ago

True; but the functionality of nocache is achieved by the wrapper shell script setting the LD_PRELOAD env variable. The initializer of nocache.so is only called from the specified executable and has no access to command line arguments. See for example how the -n option is implemented.

beroal commented 8 years ago

A more generic solution would be to introduce a method to either include (“whitelist”) or exclude (“blacklist”) certain glob patterns.

A wildcard never matches the pathname separator, so how do I specify all descendants of a directory by a glob pattern?

Feh commented 8 years ago

Use fnmatch(3) without FNM_PATHNAME.

$ cat fnmatch.c
#include <fnmatch.h>
int main(int argc, char *argv[]) {
        return fnmatch("foo/*", argv[1], 0);
}
$ gcc -Wall -o fnmatch fnmatch.c
$ ./fnmatch foo/bar/baz && echo it matches
it matches
beroal commented 8 years ago

Well, I implemented this feature request in my fork. I decided to use POSIX Extended Regular Expressions because they are more straightforward and more powerful than glob patterns. What do you think?

beroal commented 8 years ago

Because the library remembers which pages (ie., 4K-blocks of the file) were already in file system cache when the file was opened, these will not be marked as "don't need", because other applications might need that, although they are not actively used (think: hot standby).

I don't understand this. Do you think that OS uses the last suggestion instead of joining suggestions from all processes?

Feh commented 8 years ago

I’ve added some comments to your commit https://github.com/beroal/nocache/commit/c3956d384d04837dc33dc1756dd7e73754aae919

Do you think that OS uses the last suggestion instead of joining suggestions from all processes?

The reality is a bit more complicated, but in principle, yes. If process A reads file X completely it’s in the FS cache; if B now maps X and does an fadvise with “don’t need” on the file descriptor, the contents are evicted from the cache; subsequent reads of A from X will require going back to the storage medium to retrieve data.

In other words: Without this mechanism, you might evict files that are in active use, thereby impacting other processes.

beroal commented 8 years ago

Then the Linux kernel is kind of stupid.

beroal commented 8 years ago

Regarding maybe_store_pageinfo. All my additions contain cond or pattern. I group code by keywords. Other suggestions are implemented.

beroal commented 8 years ago

Documentation. The cache is disabled for a file iff (I and not E) where I iff the file name satisfies the environment variable NOCACHE_PATTERN_INCLUDE (default: true), E iff the file name satisfies the environment variable NOCACHE_PATTERN_EXCLUDE (default: false). Both variables are treated as POSIX Extended Regular Expressions.

Feh commented 8 years ago

I left some comments on https://github.com/beroal/nocache/commit/1e6061c9879b21f1d22607cc5d783f5abdf20a3f again.

Then the Linux kernel is kind of stupid.

Yes, and you’re welcome to improve it. The code is in mm/fadvise.c. Beware though that good and robust cache invalidation is one of the harder problems in programming.

Documentation.

Can you please add command line options to the nocache shell wrapper and add documentation to the Readme?

beroal commented 8 years ago

I’d make explicit what you expect, i.e. if(regcomp(…) != NULL)

Look at the type of regcomp.

beroal commented 8 years ago

Can you please add command line options to the nocache shell wrapper and add documentation to the Readme?

Sorry, I don't know the Bash programming language and I'm happy with that. ;-)