Open GoogleCodeExporter opened 8 years ago
Hi there.
I solved this problem by inspecting the code and subclass FTPHandler myself.
I'll paste the code for anyone who needs it.
#code starts
from asynchat import async_chat
class EncodedProducer:
def __init__(self, producer):
self.producer = producer
def more(self):
return self.producer.more().decode("utf8").encode(encoding)
class EncodedHandler(FTPHandler):
def push(self, s):
async_chat.push(self, s.encode(encoding))
def push_dtp_data(self, data, isproducer=False, file=None, cmd=None):
if file==None:
if isproducer:
data=EncodedProducer(data)
else:
data=data.decode("utf8").encode(encoding)
FTPHandler.push_dtp_data(self, data, isproducer, file, cmd)
def decode(self, bytes):
return bytes.decode(encoding, self.unicode_errors)
#code ends
encoding stands for the target encoding you wish to transform to.
Using EncodedHandler instead of FTPHandler would help solve this problem
Original comment by huangkan...@gmail.com
on 1 May 2013 at 6:08
The problem here is that (AFAIK) the FTP protocol has absolutely *no* means of
specifying or querying which character-encoding is in use - you just have to
'hope' that the client and server are using the same encoding :-(
RFC2640 ( https://tools.ietf.org/html/rfc2640 ) specifies that the character
encoding SHOULD be UTF-8, and pyftpdlib is now Unicode / RFC2640 compliant.
http://code.google.com/p/pyftpdlib/issues/list?can=1&q=unicode
So I guess your code serves as an example of how you could _force_ a different
encoding if you can't use UTF-8, but IMHO it shouldn't be built into the
library... of course Giampaolo may disagree ;-)
Original comment by gc...@loowis.durge.org
on 1 May 2013 at 11:54
Yes Andrew is right. I didn't make the server encoding configurable exactly for
this reason: as per RFC guideline client and server have no way to agree on a
specific encoding, therefore I thought it was better to just stick with UTF-8
as dictated by RFC and be done with it.
If on one hand this is "the right thing to do", on the other hand perhaps there
are cases where changing the default server encoding in order to support
misbehaving clients might be desirable (note: at the cost of 'breaking'
compliant ones). If this is the case I'd like to hear more about the scenario
the OP is facing (in detail the FTP client used and what happens by using
UTF-8).
That said, the code shown above changes the encoding of the control connection
(and that might be "right") but also applies an encoding for the data exchanged
through the data connection, and that is something which should be done only
for the listing commands (LIST, MLSD, etc), not when transmitting files.
What you want to do instead is override AbstractedFS's format_list() and
format_mlsx() methods and leave FTPHandler.push_dtp_data alone:
class CustomFS(AbstractedFS):
def format_list(self, *args, **kwargs):
generator = AbstractedFS.format_mlst(self, *args, **kwargs)
for item in generator:
yield item.decode("utf8").encode(YOUR_ENCODING)
# same for format_mlsx()
If we decide to make server encoding configurable we can avoid to go through
all these troubles, but I'd like to hear OP's scenario first in order to figure
out if it's actually worth the effort.
Original comment by g.rodola
on 1 May 2013 at 11:31
Just a quick note - Giampaolo's defintely right that you don't want to mess
about with the 'encoding' for the actual file data (would give corrupted
files), but presumably _if_ the encoding of filenames for the LIST and MLSD
commands is being altered, then the encoding of the filenames for the
STOR/RETR/DELE/etc. commands would need to be altered too?
Original comment by gc...@loowis.durge.org
on 2 May 2013 at 12:29
Yes, but apparently he did that already by overriding FTPHandler.decode().
Original comment by g.rodola
on 2 May 2013 at 12:38
Hi there.
Thanks for all your responses.
Well, I understood that these "functions" would not be included in the library
source cause it is not really a function that should be considered...
But let's talk about the codes I pasted, hmmmmm.... I know it may sound like a
dirty hack, but shouldn't push_dtp_data always be called with a non-None file
argument if it is transmitting files? So would it be nice to distinguish list
commands from file data by checking the file argument in push_dtp_data? Does
this method have any kind of limitations ?
Cheers.
Original comment by huangkan...@gmail.com
on 2 May 2013 at 1:05
push_dtp_data() is called with a "producer" argument also for listing commands,
not only for files-related ones. That aside, a 'cmd' argument is also passed,
so you might want to inspect that.
Original comment by g.rodola
on 2 May 2013 at 1:09
Original issue reported on code.google.com by
huangkan...@gmail.com
on 1 May 2013 at 3:05