bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 353 forks source link

ASCII error in damage.py #2854

Closed waemm closed 5 years ago

waemm commented 5 years ago

Hi guys,

I have a strange error that has popped up inconsistently in the pipeline. It looks like it is crashing because of some non-ascii character that it is attempting to write.

The strange thing is once the pipeline crashes, if I rerun it on a single node (currently I'm running bcbio using ipython on sge) then it seems to keep going no problem. I'm not sure if this could mean the pipeline continues with a partially written file or if this is more related to some issue with a cluster node reading from a mounted drive ? Any ideas would be greatly appreciated!

See example of errors below:

[2019-05-30T19:45Z] ip-172-34-10-227: tabix index CLL002_tumor-effects-annotated-annotated-gemini-priority-germline.vcf.gz [2019-05-30T19:45Z] ip-172-34-10-227: Filter low frequency variants for DNA damage and strand bias [2019-05-30T20:15Z] ip-172-34-10-232: bgzip CLL001_tumor-effects-annotated-annotated-gemini-priority-damage.vcf [2019-05-30T20:15Z] ip-172-34-10-232: tabix index CLL001_tumor-effects-annotated-annotated-gemini-priority-damage.vcf.gz [2019-05-30T20:15Z] ip-172-34-10-232: Unexpected error Traceback (most recent call last): File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 52, in _setup_logging yield config File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 308, in postprocess_variants return ipython.zip_args(apply(variation.postprocess_variants, args)) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 80, in apply return object(args, kwargs) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/variation.py", line 124, in postprocess_variants data, orig_items) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/damage.py", line 41, in run_filter data["vrn_file"] = _filter_to_info(raw_file, items[0]) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/damage.py", line 62, in _filter_to_info out_handle.write(_rec_filter_to_info(line)) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1778: ordinal not in range(128) [2019-05-30T20:15Z] ip-172-34-10-232: Finalizing variant calls: CLL003_tumor, mutect2 [2019-05-30T20:15Z] ip-172-34-10-232: Calculating variation effects for CLL003_tumor, mutect2 [2019-05-30T20:15Z] ip-172-34-10-232: snpEff effects : CLL003_tumor [2019-05-30T20:16Z] ip-172-34-10-103: bgzip CLL001_tumor-effects-annotated-annotated-gemini-priority-damage.vcf [2019-05-30T20:16Z] ip-172-34-10-103: tabix index CLL001_tumor-effects-annotated-annotated-gemini-priority-damage.vcf.gz [2019-05-30T20:16Z] ip-172-34-10-103: Unexpected error Traceback (most recent call last): File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 52, in _setup_logging yield config File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 308, in postprocess_variants return ipython.zip_args(apply(variation.postprocess_variants, args)) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 80, in apply return object(args, kwargs) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/variation.py", line 124, in postprocess_variants data, orig_items) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/damage.py", line 41, in run_filter data["vrn_file"] = _filter_to_info(raw_file, items[0]) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/damage.py", line 62, in _filter_to_info out_handle.write(_rec_filter_to_info(line)) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1917: ordinal not in range(128) [2019-05-30T20:16Z] ip-172-34-10-103: Finalizing variant calls: CLL003_tumor, vardict [2019-05-30T20:16Z] ip-172-34-10-103: Calculating variation effects for CLL003_tumor, vardict [2019-05-30T20:16Z] ip-172-34-10-103: snpEff effects : CLL003_tumor [2019-05-30T20:17Z] ip-172-34-10-244: bgzip CLL002_tumor-effects-annotated-annotated-gemini-priority-damage.vcf [2019-05-30T20:17Z] ip-172-34-10-244: tabix index CLL002_tumor-effects-annotated-annotated-gemini-priority-damage.vcf.gz [2019-05-30T20:17Z] ip-172-34-10-244: Unexpected error Traceback (most recent call last): File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 52, in _setup_logging yield config File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 308, in postprocess_variants return ipython.zip_args(apply(variation.postprocess_variants, args)) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 80, in apply return object(args, **kwargs) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/variation.py", line 124, in postprocess_variants data, orig_items) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/damage.py", line 41, in run_filter data["vrn_file"] = _filter_to_info(raw_file, items[0]) File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/damage.py", line 62, in _filter_to_info out_handle.write(_rec_filter_to_info(line)) UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 2450: ordinal not in range(128) [2019-05-30T20:17Z] ip-172-34-10-244: Finalizing variant calls: CLL005_tumor, mutect2 [2019-05-30T20:17Z] ip-172-34-10-244: Calculating variation effects for CLL005_tumor, mutect2 [2019-05-30T20:17Z] ip-172-34-10-244: snpEff effects : CLL005_tumor [2019-05-30T20:18Z] ip-172-34-10-232: tabix index CLL003_tumor-effects.vcf.gz [2019-05-30T20:18Z] ip-172-34-10-232: Annotate VCF file: CLL003_tumor, mutect2

YAML ( this is a snippet from a single sample) :

details:

chapmanb commented 5 years ago

Warren; Sorry about the issue. Are you running the latest development version or a previous release? There is a fix which I think will resolve this in development:

https://github.com/bcbio/bcbio-nextgen/commit/b014bb05b81e9b200d32842ea0755380a7edd60c#diff-bf56beda3b7dc7a52f6dab4678064b24

I'm not sure why you're having inconsistent behavior here, but hopefully updating resolves it and gets your analyses running cleanly.

waemm commented 5 years ago

Thanks Brad, this cleared it up. Much appreciated for your help!