galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.41k stars 1.01k forks source link

Analyze 75 upload does not work in 19.01 version #7412

Closed foellmelanie closed 5 years ago

foellmelanie commented 5 years ago

Hi,

I tried to upload Analayze 75 files to usegalaxy.org and I got the following error message:

Fatal error: Exit code 1 ()
Traceback (most recent call last):
  File "/cvmfs/main.galaxyproject.org/galaxy/tools/data_source/upload.py", line 329, in <module>
    __main__()
  File "/cvmfs/main.galaxyproject.org/galaxy/tools/data_source/upload.py", line 320, in __main__
    metadata.append(add_composite_file(dataset, registry, output_path, files_path))
  File "/cvmfs/main.galaxyproject.org/galaxy/tools/data_source/upload.py", line 243, in add_composite_file
    stage_file(name, composite_file_path, value.is_binary)
  File "/cvmfs/main.galaxyproject.org/galaxy/tools/data_source/upload.py", line 223, in stage_file
    sniff.convert_newlines(dp, tmp_dir=tmpdir, tmp_prefix=tmp_prefix)
  File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/datatypes/sniff.py", line 122, in convert_newlines
    for i, line in enumerate(io.open(fname, mode="U", encoding='utf-8')):
  File "/cvmfs/main.galaxyproject.org/venv/lib/python2.7/codecs.py", line 314, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

Planemo with 18.09 is working with Analyze 75 files but not Planemo with 19.01 version: https://github.com/galaxyproteomics/tools-galaxyp/pull/350

mvdbeek commented 5 years ago

Thanks for the report @foellmelanie. The datatype for analyze75 files says that the hdr file is not binary:

        """The header file. Provides information about dimensions, identification, and processing history."""
        self.add_composite_file(
            'hdr',
            description='The Analyze75 header file.',
            is_binary=False)

While the test data in the tool that fails is binary https://github.com/galaxyproteomics/tools-galaxyp/blob/f127be2141cf22e269c85282d226eb16fe14a9c1/tools/cardinal/test-data/Analyze75.hdr

I assume the datatype is wrong and this hdr file can be binary ? In that case we need to change the datatype. We are now more strict when converting universal newlines and require files to actually be text files when we do this. I guess on top of the datatype fix we might want to also ignore failed newline conversions.

foellmelanie commented 5 years ago

@mvdbeek thanks for resolving this so fast!

Unfortunately I have a similar problem with another composite datatype: 'imzml'.

The upload of some files has worked while others gave the following error:

Traceback (most recent call last):
  File "/cvmfs/main.galaxyproject.org/galaxy/tools/data_source/upload.py", line 329, in <module>
    __main__()
  File "/cvmfs/main.galaxyproject.org/galaxy/tools/data_source/upload.py", line 320, in __main__
    metadata.append(add_composite_file(dataset, registry, output_path, files_path))
  File "/cvmfs/main.galaxyproject.org/galaxy/tools/data_source/upload.py", line 243, in add_composite_file
    stage_file(name, composite_file_path, value.is_binary)
  File "/cvmfs/main.galaxyproject.org/galaxy/tools/data_source/upload.py", line 223, in stage_file
    sniff.convert_newlines(dp, tmp_dir=tmpdir, tmp_prefix=tmp_prefix)
  File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/datatypes/sniff.py", line 122, in convert_newlines
    for i, line in enumerate(io.open(fname, mode="U", encoding='utf-8')):
  File "/cvmfs/main.galaxyproject.org/venv/lib/python2.7/codecs.py", line 314, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 1358: invalid start byte
bgruening commented 5 years ago

Should we create a new issue about this other datatype?

mvdbeek commented 5 years ago

I'd say in general yes, keeps it easier to check when which bug has been fixed and in which commit. I'll have a look.

mvdbeek commented 5 years ago

I'm a little confused by the imzml datatype, if I understand the specs (https://ms-imaging.org/wp/wp-content/uploads/2009/08/specifications_imzML1.1.0_RC1.pdf) correctly the metadata file should be xml (so not binary, I guess) ... can someone confirm that whether this file is supposed to be text or binary ?

bgruening commented 5 years ago

This is correct, or at least my understanding. https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/cardinal/test-data (imzml=xml + ibd=binary)

foellmelanie commented 5 years ago

It's correct. Its a bit confusing because the composite imzMLfile consists of an imzML subfile (xml) and ibd subfile (binary).

mvdbeek commented 5 years ago

So any chance that I could my hands on a file that fails the upload ?

foellmelanie commented 5 years ago

Thanks for your help. This file fails: https://github.com/galaxyproteomics/tools-galaxyp/blob/master/tools/cardinal/test-data/Example_Processed.imzML https://github.com/galaxyproteomics/tools-galaxyp/blob/master/tools/cardinal/test-data/Example_Processed.ibd

mvdbeek commented 5 years ago

https://github.com/galaxyproteomics/tools-galaxyp/blob/master/tools/cardinal/test-data/Example_Processed.imzML is in "ISO-8859-1" encoding, if you use recode before uploading to Galaxy it should work fine. We probably need some logic to handle non-default encodings, but I don't think it'll happen immediately.

foellmelanie commented 5 years ago

Thank you @mvdbeek for pointing this out. The weird thing is that this file has worked before in previous Galaxy versions.

mvdbeek commented 5 years ago

The imzml example should work on 19.05, which should be released pretty soon. Since that was a larger change we'll not backport this to 19.01. Many thanks for the report @foellmelanie !