giampaolo / pyftpdlib

Extremely fast and scalable Python FTP server library
MIT License
1.68k stars 262 forks source link

Transparent filename encoding transform? #257

Open giampaolo opened 10 years ago

giampaolo commented 10 years ago

From huangkan...@gmail.com on May 01, 2013 05:05:59

Hi there.

I've been working with your library for quite a while, and it was just so 
simple yet worked like a charm.

I'm here to suggest some enhancements. There is a common problem to deal with 
FTP servers that is about encoding. While the client file name encoding is 
different from server file name encoding, we would have a lot of problems 
dealing with that. This is quite common here for Windows' default Chinese file 
name encoding is GBK yet Linux's is UTF-8. So I wish to add some transparent 
encoding transform function in my ftp server. And I think that it would need 
some implementation in the library source...

Also, if this function can simply be implemented by subclassing FTPHandler or 
something, please let me know.

Cheers.

Original issue: http://code.google.com/p/pyftpdlib/issues/detail?id=257

giampaolo commented 10 years ago

From huangkan...@gmail.com on April 30, 2013 23:08:23

Hi there.

I solved this problem by inspecting the code and subclass FTPHandler myself.

I'll paste the code for anyone who needs it.

#code starts
from asynchat import async_chat

class EncodedProducer:
    def __init__(self, producer):
        self.producer = producer
    def more(self):
        return self.producer.more().decode("utf8").encode(encoding)

class EncodedHandler(FTPHandler):

    def push(self, s):
        async_chat.push(self, s.encode(encoding))

    def push_dtp_data(self, data, isproducer=False, file=None, cmd=None):
        if file==None:
            if isproducer:
                data=EncodedProducer(data)
            else:
                data=data.decode("utf8").encode(encoding)

        FTPHandler.push_dtp_data(self, data, isproducer, file, cmd)

    def decode(self, bytes):
        return bytes.decode(encoding, self.unicode_errors)
#code ends

encoding stands for the target encoding you wish to transform to.

Using EncodedHandler instead of FTPHandler would help solve this problem
giampaolo commented 10 years ago

From gc...@loowis.durge.org on May 01, 2013 04:54:03

The problem here is that (AFAIK) the FTP protocol has absolutely *no* means of 
specifying or querying which character-encoding is in use - you just have to 
'hope' that the client and server are using the same encoding :-( RFC2640 ( 
https://tools.ietf.org/html/rfc2640 ) specifies that the character encoding 
SHOULD be UTF-8, and pyftpdlib is now Unicode / RFC2640 compliant. 
https://code.google.com/p/pyftpdlib/issues/list?can=1&q=unicode So I guess your 
code serves as an example of how you could _force_ a different encoding if you 
can't use UTF-8, but IMHO it shouldn't be built into the library... of course 
Giampaolo may disagree ;-)
giampaolo commented 10 years ago

From g.rodola on May 01, 2013 16:31:41

Yes Andrew is right. I didn't make the server encoding configurable exactly for 
this reason: as per RFC guideline client and server have no way to agree on a 
specific encoding, therefore I thought it was better to just stick with UTF-8 
as dictated by RFC and be done with it.

If on one hand this is "the right thing to do", on the other hand perhaps there 
are cases where changing the default server encoding in order to support 
misbehaving clients might be desirable (note: at the cost of 'breaking' 
compliant ones). If this is the case I'd like to hear more about the scenario 
the OP is facing (in detail the FTP client used and what happens by using UTF-8).

That said, the code shown above changes the encoding of the control connection 
(and that might be "right") but also applies an encoding for the data exchanged 
through the data connection, and that is something which should be done only 
for the listing commands (LIST, MLSD, etc), not when transmitting files. 
What you want to do instead is override AbstractedFS's format_list() and 
format_mlsx() methods and leave FTPHandler.push_dtp_data alone:

class CustomFS(AbstractedFS):

    def format_list(self, *args, **kwargs):
        generator = AbstractedFS.format_mlst(self, *args, **kwargs)
        for item in generator:
             yield item.decode("utf8").encode(YOUR_ENCODING)

     # same for format_mlsx()

If we decide to make server encoding configurable we can avoid to go through 
all these troubles, but I'd like to hear OP's scenario first in order to figure 
out if it's actually worth the effort.
giampaolo commented 10 years ago

From gc...@loowis.durge.org on May 01, 2013 17:29:16

Just a quick note - Giampaolo's defintely right that you don't want to mess 
about with the 'encoding' for the actual file data (would give corrupted 
files), but presumably _if_ the encoding of filenames for the LIST and MLSD 
commands is being altered, then the encoding of the filenames for the 
STOR/RETR/DELE/etc. commands would need to be altered too?
giampaolo commented 10 years ago

From g.rodola on May 01, 2013 17:38:30

Yes, but apparently he did that already by overriding FTPHandler.decode().
giampaolo commented 10 years ago

From huangkan...@gmail.com on May 01, 2013 18:05:30

Hi there.

Thanks for all your responses.

Well, I understood that these "functions" would not be included in the library 
source cause it is not really a function that should be considered...

But let's talk about the codes I pasted, hmmmmm.... I know it may sound like a 
dirty hack, but shouldn't push_dtp_data always be called with a non-None file 
argument if it is transmitting files? So would it be nice to distinguish list 
commands from file data by checking the file argument in push_dtp_data? Does 
this method have any kind of limitations ?

Cheers.
giampaolo commented 10 years ago

From g.rodola on May 01, 2013 18:09:43

push_dtp_data() is called with a "producer" argument also for listing commands, 
not only for files-related ones. That aside, a 'cmd' argument is also passed, 
so you might want to inspect that.
tomsux commented 7 years ago

giampaolo commented on 29 May 2014 From huangkan...@gmail.com on April 30, 2013 23:08:23

from asynchat import async_chat

class EncodedProducer:
    def __init__(self, producer):
        self.producer = producer
    def more(self):
        return self.producer.more().decode("utf8").encode(encoding)

class EncodedHandler(FTPHandler):

    def push(self, s):
        async_chat.push(self, s.encode(encoding))

    def push_dtp_data(self, data, isproducer=False, file=None, cmd=None):
        if file==None:
            if isproducer:
                data=EncodedProducer(data)
            else:
                data=data.decode("utf8").encode(encoding)

        FTPHandler.push_dtp_data(self, data, isproducer, file, cmd)

    def decode(self, bytes):
        return bytes.decode(encoding, self.unicode_errors)

I got this error:

handler = EncodedHandler()
TypeError: __init__() missing 2 required positional arguments: 'conn' and 'server'
giampaolo commented 7 years ago

What does this have to do with the original issue?

alextooter commented 5 years ago

The top

From huangkan...@gmail.com on May 01, 2013 05:05:59

Hi there.

I've been working with your library for quite a while, and it was just so 
simple yet worked like a charm.

I'm here to suggest some enhancements. There is a common problem to deal with 
FTP servers that is about encoding. While the client file name encoding is 
different from server file name encoding, we would have a lot of problems 
dealing with that. This is quite common here for Windows' default Chinese file 
name encoding is GBK yet Linux's is UTF-8. So I wish to add some transparent 
encoding transform function in my ftp server. And I think that it would need 
some implementation in the library source...

Also, if this function can simply be implemented by subclassing FTPHandler or 
something, please let me know.

Cheers.

Original issue: http://code.google.com/p/pyftpdlib/issues/detail?id=257 This code is work for me.When using this lib to write a simple ftp server on macOS,if you are using Chinese filename, should use EncodedHandler in your code, and set encoding="GB18030".