aio-libs / aioftp

ftp client/server for asyncio (http://aioftp.readthedocs.org)
Apache License 2.0
192 stars 54 forks source link

BUG: client.list() returns empty list for windows nt ftp server #104

Open Andrei-Pozolotin opened 4 years ago

Andrei-Pozolotin commented 4 years ago
  1. for example,

    • ftp://ftp.nasdaqtrader.com
  2. for the following snippet: client.list() returns empty list for windows nt ftp server

    
    import aioftp
    import asyncio
    from urllib.parse import urlparse

async def ftp_list(remote_url:str): remote_bag = urlparse(remote_url) ftp_host = remote_bag.hostname ftp_port = remote_bag.port or aioftp.DEFAULT_PORT ftp_user = remote_bag.username or "anonymous" ftp_pass = remote_bag.password or "anonymous@anonymous.host" session = aioftp.ClientSession( host=ftp_host, port=ftp_port, user=ftp_user, password=ftp_pass, ) async with session as client: entry_list = await client.list(path="/") print(entry_list) for path, info in entry_list: print(path, info)

remote_url = "ftp://ftp.nasdaqtrader.com" asyncio.run(ftp_list(remote_url))

3. server `help` shows no `MLSD` or `LIST` assumed by `aioftp`:

ftp> help Commands may be abbreviated. Commands are:

! dir macdef proxy site $ disconnect mdelete sendport size account epsv4 mdir put status append form mget pwd struct ascii get mkdir quit system bell glob mls quote sunique binary hash mode recv tenex bye help modtime reget trace case idle mput rstatus type cd image newer rhelp user cdup ipany nmap rename umask chmod ipv4 nlist reset verbose close ipv6 ntrans restart ? cr lcd open rmdir delete lpwd passive runique debug ls prompt send


4. actual list shown in web browser:
Name Size Date Modified
aspnet_client/   9/12/12, 7:06:00 AM
atsactivity/   9/12/12, 3:52:00 AM
ClosingCross/   1/29/08, 2:08:00 AM
Downloads/   1/29/08, 2:08:00 AM
ETFData/   1/29/08, 2:08:00 AM
MonthlyShareVolume/   1/29/08, 2:08:00 AM
OpeningCross/   1/29/08, 2:08:00 AM
OrderExecutionQuality/   6/30/10, 8:29:00 AM
OrderExecutionQualityBX/   6/30/10, 8:29:00 AM
OrderExecutionQualityPSX/   11/30/10, 8:44:00 AM
phlx/   9/23/08, 2:34:00 PM
SymbolDirectory/   9/12/12, 4:22:00 AM


5. similar code / url works fine when using `ftplib`
https://docs.python.org/3/library/ftplib.html
pohmelie commented 4 years ago

First of all, this server do not support MLSx commands. You can ensure this with logging.basicConfig(level=logging.DEBUG) before your code.

DEBUG:asyncio:Using selector: EpollSelector
INFO:aioftp.client:220
INFO:aioftp.client:USER anonymous
INFO:aioftp.client:331 Anonymous access allowed, send identity (e-mail name) as password.
INFO:aioftp.client:PASS anonymous@anonymous.host
INFO:aioftp.client:230 User logged in.
INFO:aioftp.client:TYPE I
INFO:aioftp.client:200 Type set to I.
INFO:aioftp.client:EPSV
INFO:aioftp.client:229 Entering Extended Passive Mode (|||37882|)
INFO:aioftp.client:MLSD /
INFO:aioftp.client:500 'MLSD /': command not understood.
INFO:aioftp.client:TYPE I
INFO:aioftp.client:200 Type set to I.
INFO:aioftp.client:EPSV
INFO:aioftp.client:229 Entering Extended Passive Mode (|||37883|)
INFO:aioftp.client:LIST /
INFO:aioftp.client:125 Data connection already open; Transfer starting.
...

Then you can see (via extra logging or wireshark) there is actual data with files:

INFO:aioftp.client:125 Data connection already open; Transfer starting.
b'09-12-12  12:06PM       <DIR>          aspnet_client\r\n'
b'09-12-12  08:52AM       <DIR>          atsactivity\r\n'
b'01-29-08  08:08AM       <DIR>          ClosingCross\r\n'
b'01-29-08  08:08AM       <DIR>          Downloads\r\n'
b'01-29-08  08:08AM       <DIR>          ETFData\r\n'
b'01-29-08  08:08AM       <DIR>          MonthlyShareVolume\r\n'
b'01-29-08  08:08AM       <DIR>          OpeningCross\r\n'
b'06-30-10  01:29PM       <DIR>          OrderExecutionQuality\r\n'
b'06-30-10  01:29PM       <DIR>          OrderExecutionQualityBX\r\n'
b'11-30-10  02:44PM       <DIR>          OrderExecutionQualityPSX\r\n'
b'09-23-08  07:34PM       <DIR>          phlx\r\n'
b'09-12-12  09:22AM       <DIR>          SymbolDirectory\r\n'
b''
INFO:aioftp.client:226 Transfer complete.

But the problem is in parsing part. I'm not a fan of a LIST command since it has no strict format, it is for humans. This discussed a lot in the issues and each time this blows up I have an approvement that this command should not be used at all. If, and only if, @jw4js have time and energy to invest updates to LIST parsing routine, then this will be fixed. Since, historicaly, the idea behind aioftp was to not to use LIST at all. Sorry for that, but legacy bites.

pohmelie commented 4 years ago

I've just released version 0.14.0 so you have an option to force your own parsing routine. https://aioftp.readthedocs.io/client_api.html#aioftp.Client

Andrei-Pozolotin commented 4 years ago

@pohmelie Nikita:

  1. thank you so much for the fix, it works (see below)

  2. may I suggest few other corrections to the project:

  1. sample code to verify the fix:
    
    import re
    import os
    import time
    import aioftp
    import asyncio
    import pathlib
    from urllib.parse import urlparse
    from typing import Tuple, Mapping
    from datetime import datetime

this_dir = os.path.dirname(file) temp_dir = f"{this_dir}/tempdir"

def ftp_std_stamp(stamp:str) -> str: "convert remote stamp into aioftp format" return datetime.strptime(stamp, "%m-%d-%y%I:%M%p").strftime("%Y%m%d%H%M%S")

def ftp_line_parser(list_line:bytes) -> Tuple[pathlib.Path, Mapping]: """ parse ftp list lines such as: b'12-30-19 03:00AM

regnms\r\n' b'12-30-19 02:03PM 694484 psxtraded.txt\r\n' """ list_text = list_line.decode() term_list = re.split("\s+", list_text) assert len(term_list) == 5, f"no list_line: {list_line}" has_file = term_list[2].isdigit() line_modify = ftp_std_stamp(term_list[0] + term_list[1]) line_type = "file" if has_file else "dir" line_size = int(term_list[2]) if has_file else 0 line_path = pathlib.Path(term_list[3]) line_info = dict( modify=line_modify, type=line_type, size=line_size, ) return (line_path, line_info)

async def ftp_win_nt_list(remote_url:str, remote_path:str) -> None: "verify parse_list_line_custom" remote_bag = urlparse(remote_url) ftp_host = remote_bag.hostname ftp_port = remote_bag.port or aioftp.DEFAULT_PORT ftp_user = remote_bag.username or "anonymous" ftp_pass = remote_bag.password or "anonymous@anonymous.host" session = aioftp.ClientSession( host=ftp_host, port=ftp_port, user=ftp_user, password=ftp_pass, parse_list_line_custom=ftp_line_parser, ) async with session as client: entry_list = await client.list(path=remote_path) assert len(entry_list) > 0 for path, info in entry_list: print(path, info) await ftp_file_download(client, path, info)

async def ftp_file_download(client, path, info) -> None: if info['type'] == "file" and info['size'] <= 1024: print(f"ftp_file_download: {path}") file_src = path file_dst = f"{temp_dir}/{path}-{time.time()}" assert not os.path.exists(file_dst) await client.download(source=file_src, destination=file_dst, write_into=True) assert os.path.exists(file_dst) assert info['size'] == os.path.getsize(file_dst)

assert info['modify'] == os.path.getmtime(file_dst) # TODO

remote_url = "ftp://ftp.nasdaqtrader.com" remote_path = "/SymbolDirectory" asyncio.run(ftp_win_nt_list(remote_url, remote_path))

pohmelie commented 4 years ago

may I suggest few other corrections to the project

Feel free to make pull request. I fix the typo about double "type".

please use time.time standard utc float timestamp representation for info['modify']

Not sure if got you right, but all MLSx facts are strings. More to say, there is a pretty strict description about modify field and it is not an utc timestamp: https://tools.ietf.org/html/rfc3659#section-2.3

please synchronize file modification time stamp upon transfer, so the following works

This is good point. Not sure if it is a major issue (since no one use modification/creation file time at all), but I agreed with you. Feel free to make a PR.

please rename pohmelie -> Nikita_Melentev

This is irrelevant to aioftp.