Open GoogleCodeExporter opened 9 years ago
That is a shortcoming in the api, in that you can't get a directory listing and
be able to distinguish files from directories in one call.
However when it comes to ftpfs there is a directory caching mechanism so that
even though there are two calls to listdir, it should only make a single
request to the server.
I'm afraid your patch would likely make things slower in most cases. It calls
isdir and isfile for each file in a directory listing -- on some
implementations (including ftpfs with cache hinting set to false) that would
cause a request per file...
Original comment by willmcgugan
on 20 May 2011 at 1:36
But in the existing FSls code, it calls listdir twice, once with dirs_only set
to True, and once with files_only set to True. Which means inside
_listdir_helper there's also going to be multiple calls to isdir and isfile...
so I'm fairly sure that my patch is always an improvement.
Can you provide a concrete example of where my patch is slower than the
existing code?
And the second line in FTPFS.listdir() says self.clear_dircache(path), so each
listdir call *does* make a separate server request, in spite of the caching
(but isdir and isfile do use the cache).
Original comment by gc...@loowis.durge.org
on 20 May 2011 at 2:03
...maybe there should be a --no-colours option to fsls, so that it doesn't
always have to distinguish between files and directories? ;)
Original comment by gc...@loowis.durge.org
on 20 May 2011 at 2:13
I don't recall why that clear_dircache call was there. May have been a fix for
some issue, I'll take a look at that. It's possible that mechanism is broken.
What was intended though is that if you have done cache_hint(True) on the ftpfs
then two calls two listdir won't cause two requests to the server.
Bear in mind that the commands are intended to run on many filesystems. And in
the general case 2 calls to listdir is better than 1 call to listdir and n
calls to isfile/isdir.
Original comment by willmcgugan
on 20 May 2011 at 2:47
Forgot to add, that calls to isdir and isfile, inside a call to listdir will
definitely be cached. But when called directly they won't be cached if
cache_hint is False...
Original comment by willmcgugan
on 20 May 2011 at 3:30
Maybe you're confusing FTPFS.listdir (which does clear the cache) with
FTPFS._readdir (which doesn't clear the cache) ?
I found a bug in my fsls.py fix in the way it handled the --full option. Fixed
in the revised attached patch.
> And in the general case 2 calls to listdir is better than 1 call to listdir
and n calls to isfile/isdir.
And as I already pointed out, the 2 calls to listdir in FSls.do_run make use of
the dirs_only and files_only flags, each of which will cause _listdir_helper to
itself call isfile and isdir n times - so AFAICT the current code makes double
the number of calls to isfile/isdir than my patched code, regardless of any
caching.
Unless I'm overlooking something? :-S
Original comment by gc...@loowis.durge.org
on 20 May 2011 at 3:37
Attachments:
It looks like the clear_dircache in listdir isn't required. Feel free to
comment that out until I push a change.
You're right that listdir_helper does call isfile/isdir, but in that context
they can work with cached directory information. And the implementation is also
free to do something more efficient.
With your patch though, isfile/isdir are called directly and may not be able to
use cached directory information. For a network filesystem, that could mean
many more requests to the server
Have a look at the ftperrors decorator, particularly the call to
_enter_dircache and _leave_dircache
I'm not being combative -- patches are always welcome. It's just that I don't
want to accept a small optimization for one filesystem that slows down other
filesystems.
If you could confirm that the patch doesn't negatively effect the performance
of rpcfs, sftpfs and s3fs then I'd consider it...
Original comment by willmcgugan
on 20 May 2011 at 4:02
Okay, I've come up with an alternative implementation which addresses your
concerns :)
How does this new patch grab you?
Original comment by gc...@loowis.durge.org
on 21 May 2011 at 4:32
Attachments:
While investigating a separate problem, I've just found that in order for my
most recent patch to work "properly", I also need to add a listdir2 method to
WrapFS (if I don't, then trying to "fsls" a FTP subdirectory means that the
SubFS calls base.listdir2 and bypasses my ftpfs.listdir2).
The WrapFS implementations of e.g. hassyspath and isfile are really easy to
follow, but the WrapFS implementation of listdir goes completely over my head,
so I don't know how to implement WrapFS.listdir2 :-(
Help please? ;)
Alternatively, is there any way to make the BaseFS / WrapFS / SubFS interaction
less brittle?
Original comment by gc...@loowis.durge.org
on 30 May 2011 at 7:02
Hmmm, I'm not sure if it's "correct", but this implementation seems to work...
Original comment by gc...@loowis.durge.org
on 30 May 2011 at 7:18
Attachments:
I'm not too keen on the 'listdir2' method. The concept is fine, but appending a
2 to a method name will only lead to confusion. People will assume it
supersedes 'listdir' and deprecates it.
Also bear in mind that the classes in 'fs.expose' will all have to be updated
so they can use the new method.
Could you hold off on committing this one. I'd like to get 0.4 out without any
API changes. Which I plan to release as soon as I can figure out why serving
sftp is breaking on Py2.7
Since this modifies the base API, we should really hammer this out on the
mailing list. Just to be sure we've exhausted all possibilities.
Original comment by willmcgugan
on 30 May 2011 at 10:17
(sorry I haven't been following this issue, so I might be out of the loop a bit)
Does FTPFS provide an efficient listdirinfo() implementation? If so, you can
use the following idiom to distinguish between files and directories with a
single call to listdir::
for (path,info) in myfs.listdirinfo("/some/path",absolute=True)
if fs.util.isdir(myfs,path,info):
print "it's a dir"
else:
print "it's a file"
This assumes that listdirinfo() makes a single call to the server, and that the
info dicts provide an "st_mode" member with the appropriate bits for files and
directories.
Original comment by rfkel...@gmail.com
on 30 May 2011 at 10:57
Okay, discussion moved to the mailing list... :)
Original comment by gc...@loowis.durge.org
on 31 May 2011 at 1:03
For anyone playing catch-up:
http://groups.google.com/group/pyfilesystem-discussion/browse_thread/thread/a602
f7b9c3110af2
Original comment by gc...@loowis.durge.org
on 20 Sep 2011 at 5:48
Original issue reported on code.google.com by
gc...@loowis.durge.org
on 20 May 2011 at 1:15Attachments: