Closed thommohr closed 5 years ago
Thomas; Thank you for the report and apologies about the issues. The latest release uses Python 3, which is more careful about string encodings, and is complaining because your fastq file has some non-utf8 characters. I pushed a speculative fix which should resolve this if that's really the cause. The other potential issue might be that your input fastq is gzipped or otherwise compressed but the file extension does not match that and bcbio doesn't know. If that's the case, then adjusting the file names to match the compression will hopefully get things working cleanly. Hope one of these two gets your analysis finished.
Thanks for your quick reply, I upgraded with the -u development option, but that does not resolve the issue. The files are bzip2 compressed, with the extension .fastq.bz2. The pipeline had no problems using the version 1.1.1, so the compression should be OK. How does one force bcbio to recognize these files ?
Thomas; Thanks much for following up with the additional details. This helped isolate the issue, which wasn't really a python3 problem but rather python3 exposing that we shouldn't have been trying to do the automated format detection with bzip2 input. I pushed a fix for this, so if you update one more time and retry I hope it will now work cleanly for you. Thank you again for the help debugging and please let us know if you run into any other issues.
Dear developers,
after an upgrade to 1.1.5a (the genome issue). When running a pipeline (working in 1.1.1), I got following error:
[2019-04-06T17:32Z] multiprocessing: organize_samples [2019-04-06T17:32Z] Using input YAML configuration: /srv/workspace/rprojects/tmohr/MEDUNI/TUMOR/SKCM/SIBILIA-SKCM0010/project_hg38/config/project_hg38.yaml [2019-04-06T17:32Z] Checking sample YAML configuration: /srv/workspace/rprojects/tmohr/MEDUNI/TUMOR/SKCM/SIBILIA-SKCM0010/project_hg38/config/project_hg38.yaml Traceback (most recent call last): File "/opt/bcbio/bin/bcbio_nextgen.py", line 238, in
main(kwargs)
File "/opt/bcbio/bin/bcbio_nextgen.py", line 46, in main
run_main(kwargs)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 45, in run_main
fc_dir, run_info_yaml)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 89, in _run_toplevel
for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 126, in variant2pipeline
[x[0]["description"] for x in samples]]])
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
return run_multicore(fn, items, config, parallel=parallel)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(x) for x in items):
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 549, in init
self.results = batch()
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/utils.py", line 55, in wrapper
return f( args, *kwargs)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multitasks.py", line 424, in organize_samples
return run_info.organize(args)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 61, in organize
is_cwl=is_cwl, integrations=integrations)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 1025, in _run_info_from_yaml
_check_sample_config(run_details, run_info_yaml, config)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 791, in _check_sample_config
_check_quality_format(items)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 670, in _check_quality_format
fastq_format = _detect_fastq_format(fastq_file)
File "/opt/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/run_info.py", line 629, in _detect_fastq_format
for line in four:
File "/opt/bcbio/anaconda/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 12-13: invalid continuation byte
Any ideas what is happening ? best and thanks for the help, Thomas