RedTea / gaedav

Automatically exported from code.google.com/p/gaedav
GNU Lesser General Public License v2.1
1 stars 1 forks source link

Suggestion: combine cache_dir and cache_file, or cache MISS for performance too ? #5

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
At the moment, directory browsing takes a lot of time because each entry is
checked several times, by isdir() or isfile(), then again by getdir() or
getfile() etc.

In addition, if a path is a dir, it won't be a file and vice-versa - we
know that.

But with the current code, each file in a directory causes (4 cache HITs +
4 cache MISSes + 4 DB retrieve) :

D 04-12 06:40PM 34.943 Path.normalize: encoding str '//mydir/robots.txt' to
unicode.
#
I 04-12 06:40PM 34.948 Cache MISS: u'dir:/mydir/robots.txt'
#
D 04-12 06:40PM 34.948 Retrieving with path = u'/mydir/robots.txt'
#
D 04-12 06:40PM 34.960 Path.normalize: encoding str '//mydir/robots.txt' to
unicode.
#
I 04-12 06:40PM 34.975 Cache HIT: u'file:/mydir/robots.txt'
#
D 04-12 06:40PM 34.975 Path.normalize: encoding str '//mydir/robots.txt' to
unicode.
#
I 04-12 06:40PM 34.981 Cache HIT: u'file:/mydir/robots.txt'
#
D 04-12 06:40PM 34.981 Path.normalize: encoding str '//mydir/robots.txt' to
unicode.
#
I 04-12 06:40PM 34.987 Cache MISS: u'dir:/mydir/robots.txt'
#
D 04-12 06:40PM 34.987 Retrieving with path = u'/mydir/robots.txt'
#
D 04-12 06:40PM 34.999 Path.normalize: encoding str '//mydir/robots.txt' to
unicode.
#
I 04-12 06:40PM 35.007 Cache HIT: u'file:/mydir/robots.txt'
#
D 04-12 06:40PM 35.008 Path.normalize: encoding str '//mydir/robots.txt' to
unicode.
#
I 04-12 06:40PM 35.014 Cache MISS: u'dir:/mydir/robots.txt'
#
D 04-12 06:40PM 35.014 Retrieving with path = u'/mydir/robots.txt'
#
D 04-12 06:40PM 35.028 Path.normalize: encoding str '//mydir/robots.txt' to
unicode.
#
I 04-12 06:40PM 35.039 Cache HIT: u'file:/mydir/robots.txt'
#
D 04-12 06:40PM 35.039 Path.normalize: encoding str '//mydir/robots.txt' to
unicode.
#
I 04-12 06:40PM 35.044 Cache MISS: u'dir:/mydir/robots.txt'
#
D 04-12 06:40PM 35.045 Retrieving with path = u'/mydir/robots.txt'
#

So perhaps you could combine cache_dir and cache_file, and check the
returned type of object in getdir() and getfile() ? Or cache the MISS
results too ?

Another improvement for directory browsing might be to make better use of
the dir_set and file_set you already get when retrieving a directory. But
since I don't know what's in there, I'm not sure how this would fit in with
the rest of the code :-)

Original issue reported on code.google.com by mikes...@gmail.com on 13 Apr 2009 at 9:23

GoogleCodeExporter commented 9 years ago
Also, use local cache_dict variable inside NamespacedCache() to avoid calling
memcache.get() several times for the same path in the same request...

Original comment by mikes...@gmail.com on 13 Apr 2009 at 10:33