ResearchObject / ro-crate-py

Python library for RO-Crate
https://pypi.org/project/rocrate/
Apache License 2.0
49 stars 26 forks source link

Adding a file to a new crate gives an error when trying adding source as FTP URI: AttributeError: '_io.BufferedReader' object has no attribute 'getheader' #103

Closed lrodrin closed 2 years ago

lrodrin commented 2 years ago

Adding a file to a new crate gives an error when trying adding source as FTP URI: AttributeError: '_io.BufferedReader' object has no attribute 'getheader'

See the below example:

from rocrate.rocrate import ROCrate

crate = ROCrate()
input_uri = "ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz"
crate.add_file(source=input_uri, fetch_remote=False)
crate.write_zip("./test/crate.zip")
------------------------------------------------------
Traceback (most recent call last):
  File "/Users/laurarodrigueznavas/PycharmProjects/ro-crate-py/test/test_laura.py", line 6, in <module>
    crate.write_zip("./test/crate.zip")
  File "/Users/laurarodrigueznavas/PycharmProjects/ro-crate-py/rocrate/rocrate.py", line 486, in write_zip
    self.write(tmp_dir)
  File "/Users/laurarodrigueznavas/PycharmProjects/ro-crate-py/rocrate/rocrate.py", line 476, in write
    writable_entity.write(base_path)
  File "/Users/laurarodrigueznavas/PycharmProjects/ro-crate-py/rocrate/model/file.py", line 50, in write
    'contentSize': response.getheader('Content-Length'),
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/tempfile.py", line 469, in __getattr__
    a = getattr(file, name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/tempfile.py", line 469, in __getattr__
    a = getattr(file, name)
AttributeError: '_io.BufferedReader' object has no attribute 'getheader'
lrodrin commented 2 years ago

If the FTP URI have a security context gives an error: urllib.error.URLError: <urlopen error ftp error: error_perm('550 bundle/b37: No such file or directory')>

See the below example:

from rocrate.rocrate import ROCrate

crate = ROCrate()
input_uri = "ftp://ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.gz"
crate.add_file(source=input_uri, fetch_remote=False)
crate.write_zip("./test/crate.zip")
------------------------------------------------------
Traceback (most recent call last):
  File "/Users/laurarodrigueznavas/PycharmProjects/ro-crate-py/test/test_laura.py", line 6, in <module>
    crate.write_zip("./test/crate.zip")
  File "/Users/laurarodrigueznavas/PycharmProjects/ro-crate-py/rocrate/rocrate.py", line 486, in write_zip
    self.write(tmp_dir)
  File "/Users/laurarodrigueznavas/PycharmProjects/ro-crate-py/rocrate/rocrate.py", line 476, in write
    writable_entity.write(base_path)
  File "/Users/laurarodrigueznavas/PycharmProjects/ro-crate-py/rocrate/model/file.py", line 49, in write
    with urllib.request.urlopen(self.source) as response:
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1554, in ftp_open
    raise exc.with_traceback(sys.exc_info()[2])
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1536, in ftp_open
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1557, in connect_ftp
    return ftpwrapper(user, passwd, host, port, dirs, timeout,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 2378, in __init__
    self.init()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 2390, in init
    self.ftp.cwd(_target)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/ftplib.py", line 614, in cwd
    return self.voidcmd(cmd)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/ftplib.py", line 280, in voidcmd
    return self.voidresp()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/ftplib.py", line 253, in voidresp
    resp = self.getresp()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/ftplib.py", line 248, in getresp
    raise error_perm(resp)
urllib.error.URLError: <urlopen error ftp error: error_perm('550 bundle/b37: No such file or directory')>
lrodrin commented 2 years ago

Removing the URI validation avoids the error, but is it the correct way to do it?

from rocrate.rocrate import ROCrate

crate = ROCrate()
input_uri = "ftp://ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.gz"
crate.add_file(source=input_uri, fetch_remote=False, validate_url=False)
crate.write_zip("./test/crate.zip")
simleo commented 2 years ago

Removing the URI validation avoids the error, but is it the correct way to do it?

I think it is. Library users can always validate the URL on their own and add extra information like contentSize or encodingFormat via properties if desired. The validate_url arg is a nice extra feature that does this for you in simple cases, but trying to cover all possible cases -- especially where authentication might be involved -- is out of the library's scope.

lrodrin commented 2 years ago

Thanks, @simleo. I will use validate_url=False for these cases.

We can close the issue.