galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.41k stars 1.01k forks source link

"utf-8 codec can't decode byte 0x8b in position 1" after job completion #10140

Closed bgruening closed 4 years ago

bgruening commented 4 years ago

As far as I could figure out this happens after job completion and I saw some wired symbols on the stdout of this tool (Alevin). This happens only with 20.05 not with 19.09.

Traceback (most recent call last):
  File "/opt/galaxy/server/lib/galaxy/jobs/runners/__init__.py", line 540, in _finish_or_resubmit_job
    job_wrapper.finish(tool_stdout, tool_stderr, exit_code, check_output_detected_state=check_output_detected_state, job_stdout=job_stdout, job_stderr=job_stderr)
  File "/opt/galaxy/server/lib/galaxy/jobs/__init__.py", line 1687, in finish
    output_name, dataset, job, context, final_job_state, remote_metadata_directory
  File "/opt/galaxy/server/lib/galaxy/jobs/__init__.py", line 1544, in _finish_dataset
    dataset.datatype.set_meta(dataset, overwrite=False)
  File "/opt/galaxy/server/lib/galaxy/datatypes/tabular.py", line 1270, in set_meta
    for i, l in enumerate(dataset_fh):
  File "/usr/lib64/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
mvdbeek commented 4 years ago

If the starting position has an invalid byte the output is probably binary. That would need to be present on the datatype

mvdbeek commented 4 years ago

https://github.com/bgruening/galaxytools/blob/master/tools/salmon/alevin.xml#L193 all the outputs that are gzipped need to either be unpacked or declared as another datatype. Python 2 was a bit more lenient here, but ultimately that has always been wrong, those outputs for instance can't be processed by tools that work on txt files.

bgruening commented 4 years ago

Uha, I see - thanks for the hint. ping @astrovsky01