giampaolo / pyftpdlib

Extremely fast and scalable Python FTP server library
MIT License
1.65k stars 267 forks source link

feature required to connect to hdfs #517

Closed ebrahim-abbasi closed 4 years ago

ebrahim-abbasi commented 4 years ago

We want to use pyftpdlib over hdfs for large files. We are wondering if there is a plan to implement this feature in the near future.

Thanks in advance.

giampaolo commented 4 years ago

Nope, sorry.

ebrahim-abbasi commented 4 years ago

Dear @giampaolo Thanks for your response. Would you please let me know how can I add this feature? (files to be modified, rules to be followed ...) And in case of adding this feature, is there a possibility to publish it here? Best

giampaolo commented 4 years ago

Take a look at AbstractedFS class. You should write a custom HDFS class on top of that and re-implement all the methods. I don't know HDFS but in pseudo-code you'll likely end up doing something like this:

import os

from pyftpdlib.authorizers import DummyAuthorizer
from pyftpdlib.handlers import FTPHandler
from pyftpdlib.servers import FTPServer
from pyftpdlib.filesystems import AbstractedFS

class HDFSFileSystem(AbstractedFS):

    # re-implement all methods

    def mkdir(self, path):
        ...

    def rmdir(self, path):
        ...

    def open(self, path, mode):
        ...

authorizer = DummyAuthorizer()
authorizer.add_user('user', '12345', os.getcwd(), perm='elradfmwMT')
handler = FTPHandler
handler.authorizer = authorizer
handler.abstracted_fs = HDFSFileSystem
server = FTPServer(('', 2121), handler)
server.serve_forever()
ebrahim-abbasi commented 4 years ago

Big Thanks.

giampaolo commented 4 years ago

You are welcome. Good luck!

ebrahim-abbasi commented 4 years ago

Dear @giampaolo I used pydoop to implement the HDFSFileSystem class. Then using Apache hadoop client, I am connecting to an HDFS instance in a remote computer. I attached my implementation (hdfsfilesystem.txt). Then I used it in the following code: ///////////////////////////// def main(): authorizer = DummyAuthorizer() authorizer.add_user('admin', 'admin', '.', perm='elradfmwMT')

fs = pydoop.hdfs.hdfs(host='default',port=0, user='admin')
fs.set_working_directory(path='test')

handler = FTPHandler
handler.authorizer = authorizer
handler.abstracted_fs = HDFSFileSystem
handler.abstracted_fs.hdfsConn = fs
server = FTPServer(('', 21), handler)
server.serve_forever()

if name == 'main': main()

////////////////////////////

When I am trying to connect to this from a FTP client (I am using File Zilla), getting the following error: Error: Failed to parse returned path. Error: Failed to retrieve directory listing I think I need more configs to integrate my code in pyftpdlib. Would you please take a look to my code and let me know where am I incorrect? Thanks in advance

hdfsfilesystem.txt

giampaolo commented 4 years ago

Try to paste the actual traceback.