4udak / pyftpdlib

Automatically exported from code.google.com/p/pyftpdlib
Other
1 stars 1 forks source link

slow .read on file blocks mainloop for too long #197

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. create a costum AbstractedFS that has a `open` method that returns a file 
like object which `read` method takes a long time to return (for example a 
wrapped `httplib.HTTPResponse` object)

The problem should also be present if with the default configuration, files are 
served that are located in a very slow file system (for example files mapped 
from another computer to the local file system)

What is the expected output? What do you see instead?
File objects returned by `AbstractFS.open` should first be checked in some way 
if a nonblocking `.read` can be performed before doing so.
`.read` is called even when it will block the entire ftp server.

Please use labels and text to provide additional information.
I am serving HTTP content on another server over FTP by streaming it.
By doing so pyftpdlib becomes sluggish an unusable to the client.

This is not only a problem of `read`, `listidr`, `stat`, etc may also block for 
too long.

Original issue reported on code.google.com by irae.hue...@gmail.com on 22 Dec 2011 at 1:22

GoogleCodeExporter commented 9 years ago
Yes, this is a well known issue.
Unfortunately there's no easy/generic fix for a number of reasons.
Internally we use asyncore which does not provide anything to do that.
Also, httplib.HTTPResponse is not supposed to be used in async environments.

I wouldn't even know what to recommend exactly as it's a problem which is hard 
to resolve and there's no easy or standard way to deal with it.
I already bumbed into it, and I solved it by using a mix of multi 
threads/processes and pyftpdlib.ftpserver.CallLater, but it's pretty hackish 
(in fact I don't think it's worth to show the code).

Maybe the quickest solution would consist in spawning a thread/process for 
every connected client and use a separate socket map 
(http://hg.python.org/cpython/file/b36cb4602e21/Lib/asyncore.py#l66).
That way you would use multiple event loops in multiple threads/processes and 
any dispatcher subclass can then be free to block as long as it wants.

It's something which must be developed from scratch though, tested, etc...
Maybe I can provide a proof of concept once I find some time.

Original comment by g.rodola on 22 Dec 2011 at 2:34

GoogleCodeExporter commented 9 years ago
pyftpdlib is a really amazing and well written FTP server with excellent 
customization possibilities. But this is a serious issue, any file system 
access should be considered blocking.

can you show me your hackish workaround anyway :-)

Original comment by irae.hue...@gmail.com on 22 Dec 2011 at 5:52

GoogleCodeExporter commented 9 years ago
Well, there are actually two different problems here:

#1 - file read() / write(), which takes place in the data channel
#2 - all other fs-related calls (listdir(), rename(), cwd(), mkdir(), etc...) 
which takes place in the data channel

Files (#1) can somehow be integrated in the event loop without using multiple 
threads/processes but only if they provide a readable/writable() method; the 
idea is to call read()/write() only when the file is actually ready to be read 
or written. 

Other fs calls (#2) cannot be integrated as described above as they are 
blocking by nature (think about os.listdir()), therefore the only way to deal 
with them is to make the call into separate process or thread.
There's a FAQ for this: 
http://code.google.com/p/pyftpdlib/wiki/FAQ#How_can_I_run_long-running_tasks_wit
hout_blocking_the_server?

This is the general idea. 
The two problems are very different and finding a general and clean solution is 
far from easy.
There are frameworks out there, such as Twisted, which provide some facilities 
to deal with threads/processes within the async loop, but they do not guarantee 
thread-safeness, which IMO, suggests how hard this subject is:
http://twistedmatrix.com/documents/current/core/howto/threading.html

As for your specific problem, httplib.HTTPResponse is simply not designed to 
work with async apps/libs and cannot be integrated with asyncore.
You would have the exact same problems in other environments (twisted, tornado, 
etc...) whereas you would use *their* non-blocking HTTP clients (asyncore does 
not have one).
I still think the quickest solution is to use different threads/processes per 
event-loop.
I'll try to write down some code, but I cannot tell when exactly.

> can you show me your hackish workaround anyway

For what it's worth, it's in attachment.

Original comment by g.rodola on 22 Dec 2011 at 7:14

Attachments:

GoogleCodeExporter commented 9 years ago
Update - you might want to take a look at this asycore HTTP client, kindly 
provided by Josiah Carlson:
https://gist.github.com/1519999

Looking back at this, I think I'm going to close this issue after all, as I 
think it's not something which can or even should be dealt with by pyftpdlib. I 
mean, it's an async lib, and as such it should be used in a certain way. 
Integrating it with blocking libs such as httplib is not the way it is meant to 
be used. 

If you want to discuss further, please feel free to post on the ml.

Original comment by g.rodola on 30 Dec 2011 at 5:03

GoogleCodeExporter commented 9 years ago
Merging this one into issue 212. I have some interesting news about it.

Original comment by g.rodola on 2 Aug 2012 at 7:56