gesiscss / orc

This repository was superseded by https://github.com/gesiscss/orc2 - Open Research Computing
https://notebooks.gesis.org/
MIT License
40 stars 12 forks source link

Allow FTP downloading #49

Open acocac opened 2 years ago

acocac commented 2 years ago

I recently started using GESIS binder for showcasing notebooks with geographical spatio-temporal analysis. I wondered if there's a sort of firewall blocking downloading files from FTP sources.

For instance, when wget the following ftp source: !wget ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf

It keeps unresponsive

--2021-12-09 17:19:51--  ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf
           => ‘dp13046.pdf’
Resolving ftp.zew.de (ftp.zew.de)... 193.196.11.224
Connecting to ftp.zew.de (ftp.zew.de)|193.196.11.224|:21... 

The same problem exists with python urllib. urllib.request.urlretrieve('ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf', 'dp13046.pdf')

I would appreciate guidance on how to fetch data from FTP sources within GESIS binder environments.

MridulS commented 2 years ago

Hi @acocac, thanks for flagging this.

Yes, port 21 (FTP) is blocked on GESIS binder. We try to be as conservative with outgoing connections as possible.

Would it be possible to download the pdf file and "package" it with the repository?

acocac commented 2 years ago

@MridulS thanks for the reply. So the example of the pdf file is just for illustrative purposes. Actually, there are some datasets (100-500 MB) which I would prefer to fetch from the original source. I wondered if there is a workaround to allow port 21 in certain cases. However, if the nature of GESIS binder is to be conservative, I'll try to mirror the datasets in FAIR data portals where the data provider allows it.