frictionlessdata / forum

🗣 Frictionless Data Forum esp for "How do I" type questions
https://frictionlessdata.io/
10 stars 0 forks source link

Support to remote source with FTP protocol #23

Closed gustavorps closed 4 years ago

gustavorps commented 5 years ago

When I try to save the descriptor on the disk a receive the follow messag:

package.save('datapackage.zip')
Traceback (most recent call last):
  File "/home/gustavorps/.miniconda3/lib/python3.7/site-packages/datapackage/package.py", line 273, in save
    z.write(path, path_inside_dp)
  File "/home/gustavorps/.miniconda3/lib/python3.7/zipfile.py", line 1710, in write
    zinfo = ZipInfo.from_file(filename, arcname)
  File "/home/gustavorps/.miniconda3/lib/python3.7/zipfile.py", line 506, in from_file
    st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/home/gustavorps/workspace/code/datasus-datapackage/ftp:/ftp.datasus.gov.br/dissemin/publicos/SIHSUS/200801_/dados/RDES1901.dbc'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gustavorps/.miniconda3/lib/python3.7/site-packages/datapackage/package.py", line 277, in save
    six.raise_from(exceptions.DataPackageException(exception), exception)
  File "<string>", line 3, in raise_from
tableschema.exceptions.DataPackageException: [Errno 2] No such file or directory: '/home/gustavorps/workspace/code/datasus-datapackage/ftp:/ftp.datasus.gov.br/dissemin/publicos/SIHSUS/200801_/dados/RDES1901.dbc'

My guess is that it does not have URI support to FTP protocol.

https://github.com/frictionlessdata/datapackage-py/blob/29a9e34a924a4187d9587549123a7d262cacdbf7/datapackage/resource.py#L209-L215

Here is my suggestion for implementation, as soon as I have time I will do my Pull Request

from urllib.parse import urlparse
import ftplib
import os 
from io import BytesIO

source = 'ftp://ftp.datasus.gov.br/dissemin/publicos/SINASC/NOV/DNRES/DNES2016.dbc'
url = urlparse(source)
path, filename = url.path.rsplit('/', 1)

ftp = ftplib.FTP()
ftp.connect(url.netloc)
ftp.login()
ftp.cwd(path)

filelike = BytesIO()

ftp.retrbinary('RETR ' + filename, filelike.write)
roll commented 5 years ago

@GustavoRPS Hi, could you please elaborate?

AFAIK underlying tabulator supports FTP. Doesn't it work for datapackage?

gustavorps commented 5 years ago

I made a update @roll

We have two issues on this:

  1. Support to download remote resources on FTP server
  2. DBC / DBF format support (but this is for another this)

So basically it is offer support to download resources when it is on a FTP server.

roll commented 5 years ago

Yea. It seems it loads files not using tabulator.

I think the problem is here - https://github.com/frictionlessdata/datapackage-py/blob/29a9e34a924a4187d9587549123a7d262cacdbf7/datapackage/resource.py#L478

Other question is that specs says that it's a correct implementation:

URLs MUST be fully qualified. MUST be using either http or https scheme. (Absence of a scheme indicates MUST be a POSIX path)

http://frictionlessdata.io/specs/data-resource/

roll commented 5 years ago

But anyway I think support for FTP should be OK and it's not against the specs.

@akariv WDYT?

roll commented 4 years ago

@gustavorps I have moved it for now, to a general project-level discussion.

It's also related to https://github.com/frictionlessdata/specs/issues/664

Once it's cleared by the specs we can implement it for datapackage-py

rufuspollock commented 4 years ago

FIXED. We have a resolution and open specs issue re adding support for ftp