fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
1.03k stars 360 forks source link

FTP not working with older Windows FTP Servers #447

Open awm33 opened 4 years ago

awm33 commented 4 years ago

First off, thank you for creating this incredible library!

I'm trying to use fsspec to load data from a couple older FTP servers. They don't support the MLSD FTP command and the _mlsd2 functions seems to assume an older UNIX based FTP server.

I suppose for improving it would be to use regular expressions in _mlsd2 to "sniff" for which format it is.

Examples:

ftp.onetgov.net fsspec URL: "ftp://ftp.onetgov.net/divisions/Infomap/pub/GIS_Downloads/FTP Shapefiles/Address Range.zip" python FTP dir output:

09-27-20  08:55AM             56006477 Address Points.zip
09-27-20  08:55AM             12246905 Address Range.zip
04-01-20  01:46PM               189059 Adopted_Comp_Plan.zip
09-27-20  08:55AM                70271 Airport Noise Contours.zip
04-01-20  01:46PM                26766 AMA.zip
09-27-20  08:55AM               823689 Benchmarks 88 Datum.zip
09-27-20  08:55AM                 6939 Boat Ramps.zip
09-27-20  08:55AM               231407 Brownfield Areas.zip
05-31-12  02:59PM             11940688 CENSUS_BLOCKS_2010.zip
05-31-12  02:59PM              1078784 CENSUS_BLOCK_GROUPS_2010.zip
05-31-12  02:59PM               806391 CENSUS_TRACTS_2010.zip
04-01-20  01:46PM                24693 CIP_Roadways.zip
09-27-20  08:55AM              6547203 Code Enforcement Officer Zones.zip
09-27-20  08:55AM                 8551 Colleges and Universities.zip

sdrftp03.dor.state.fl.us fsspec URL: "ftp://sdrftp03.dor.state.fl.us/Tax Roll Data Files/2020 Final NAL - SDF Files/Citrus 19 Final NAL 2020.zip" python FTP dir output:

10-05-20  11:43AM             24041148 Brevard 15 Final NAL 2020.zip
10-05-20  11:43AM              1347717 Brevard 15 Final SDF 2020.zip
09-22-20  11:54AM             11041829 Citrus 19 Final NAL 2020.zip
09-22-20  11:55AM               513181 Citrus 19 Final SDF 2020.zip
09-27-20  08:04PM             27358206 Duval 26 Final NAL 2020.zip
09-27-20  08:04PM              1385637 Duval 26 Final SDF 2020.zip
10-01-20  02:45PM              2216030 Gadsden 30 Final NAL 2020.zip
10-01-20  02:45PM                66989 Gadsden 30 Final SDF 2020.zip
10-05-20  05:14PM              1153184 Gilchrist 31 Final NAL 2020.zip
10-05-20  05:14PM                43427 Gilchrist 31 Final SDF 2020.zip
10-02-20  04:43PM              1292859 Gulf 33 Final NAL 2020.zip
10-02-20  04:43PM                66287 Gulf 33 Final SDF 2020.zip
10-05-20  09:39AM              2596017 Hendry 36 Final NAL 2020.zip
10-05-20  09:39AM               128792 Hendry 36 Final SDF 2020.zip
09-30-20  01:20PM              9283336 Hernando 37 Final NAL 2020.zip
09-30-20  01:20PM               534406 Hernando 37 Final SDF 2020.zip
10-01-20  01:54PM              8251812 Indian River 41 Final NAL 2020.zip
10-01-20  01:52PM               390588 Indian River 41 Final SDF 2020.zip
09-29-20  10:25AM              6002346 Monroe 54 Final NAL 2020.zip
09-29-20  10:26AM               225385 Monroe 54 Final SDF 2020.zip
09-23-20  11:09AM             33846865 Orange 58 Final NAL 2020.zip
09-23-20  11:11AM              1556983 Orange 58 Final SDF 2020.zip
10-02-20  10:17AM             11778702 Osceola 59 Final NAL 2020.zip
10-02-20  10:17AM               698317 Osceola 59 Final SDF 2020.zip
10-06-20  01:57PM             41969477 Palm Beach 60 Final NAL 2020.zip
10-06-20  01:57PM              2440714 Palm Beach 60 Final SDF 2020.zip
10-06-20  12:08PM             21268245 Sarasota 68 Final NAL 2020.zip
10-06-20  12:07PM               973079 Sarasota 68 Final SDF 2020.zip
10-01-20  06:58PM             17934070 Seminole 69 Final NAL 2020.zip
10-01-20  06:58PM               948225 Seminole 69 Final SDF 2020.zip
10-02-20  10:31AM              6449364 Walton 76 Final NAL 2020.zip
10-02-20  10:31AM               366020 Walton 76 Final SDF 2020.zip
martindurant commented 4 years ago

Would be very happy to see a solution that works for a wider selection of FTP servers (but I don't suppose we would ever catch them all). The trouble is, we only test against the python implementation, so we can't actually put this into CI unless there's some docker or other executable version of this we can run.

awm33 commented 4 years ago

@martindurant I have a working solution. I use a regex in _mlsd2 to detect windows FTP server style lines and default to unix. I realize an e2e testing is more ideal, but really this boils down to text processing in _mlsd2, perhaps a unit test on _mlsd2 would suffice?

martindurant commented 4 years ago

Yes that sounds fine, please do submit as an PR.

martindurant commented 4 years ago

Are you working on this, @awm33 ?

martindurant commented 3 years ago

(ping)