TrinityCTAT / ctat-mutations

Mutation detection using GATK4 best practices and latest RNA editing filters resources. Works with both Hg38 and Hg19
https://github.com/TrinityCTAT/ctat-mutations
Other
73 stars 18 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 #80

Closed esebesty closed 3 years ago

esebesty commented 3 years ago

I think I'm getting the same error as #60 with ctat_mutations 2.5.0. However the linked script is not there in devel branch anymore. Where should I look? Thanks!

2021-02-18 10:56:48,944: INFO Pipeliner.Pipeliner.run_cmd Running: /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat/varia
Traceback (most recent call last):
  File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py", line 47, in <module>
    for line in hndl_vcf:
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1574: invalid continuation byte
2021-02-18 10:56:49,121: ERROR Pipeliner.Pipeliner.run_cmd Error: Command '/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ct
2021-02-18 10:56:49,122: ERROR Pipeliner.Pipeliner.run Error, command: [ /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat
st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:1364
st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:2816
 ]
Traceback (most recent call last):
  File "/disk/work/shared/tools/ctat-mutations/ctat_mutations", line 2824, in <module>
    pipeliner.run()
  File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/PyLib/Pipeliner.py", line 71, in run
    cmd.run(checkpoint_dir)
  File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/PyLib/Pipeliner.py", line 132, in run
    raise RuntimeError(errmsg)
RuntimeError: Error, command: [ /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat/variants.HC_hard_cutoffs_applied.vcf /sc
st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:1364
st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:2816
 ]
brianjohnhaas commented 3 years ago

hi,

Can you privately send me the input file that's being fed to the script at that step? I'll take a look.

bhaas@broadinstitute.org

best,

~brian

On Thu, Feb 18, 2021 at 5:05 AM Endre Sebestyén notifications@github.com wrote:

I think I'm getting the same error as #60 https://github.com/NCIP/ctat-mutations/issues/60 with ctat_mutations 2.5.0. However the linked script is not there in devel branch anymore. Where should I look? Thanks!

2021-02-18 10:56:48,944: INFO Pipeliner.Pipeliner.run_cmd Running: /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat/varia Traceback (most recent call last): File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py", line 47, in for line in hndl_vcf: File "/usr/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1574: invalid continuation byte 2021-02-18 10:56:49,121: ERROR Pipeliner.Pipeliner.run_cmd Error: Command '/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ct 2021-02-18 10:56:49,122: ERROR Pipeliner.Pipeliner.run Error, command: [ /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:1364 st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:2816 ] Traceback (most recent call last): File "/disk/work/shared/tools/ctat-mutations/ctat_mutations", line 2824, in pipeliner.run() File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/PyLib/Pipeliner.py", line 71, in run cmd.run(checkpoint_dir) File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/PyLib/Pipeliner.py", line 132, in run raise RuntimeError(errmsg) RuntimeError: Error, command: [ /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat/variants.HC_hard_cutoffs_applied.vcf /sc st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:1364 st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:2816 ]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/80, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX3AV4CZ7XKHNW7VRX3S7TQ5NANCNFSM4XZ73QOQ .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

joshua-gould commented 3 years ago

You can download the fix from https://github.com/NCIP/ctat-mutations/blob/master/PyLib/ctat_util.py.

On Thu, Feb 18, 2021 at 5:05 AM Endre Sebestyén notifications@github.com wrote:

I think I'm getting the same error as #60 https://github.com/NCIP/ctat-mutations/issues/60 with ctat_mutations 2.5.0. However the linked script is not there in devel branch anymore. Where should I look? Thanks!

2021-02-18 10:56:48,944: INFO Pipeliner.Pipeliner.run_cmd Running: /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat/varia Traceback (most recent call last): File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py", line 47, in for line in hndl_vcf: File "/usr/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1574: invalid continuation byte 2021-02-18 10:56:49,121: ERROR Pipeliner.Pipeliner.run_cmd Error: Command '/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ct 2021-02-18 10:56:49,122: ERROR Pipeliner.Pipeliner.run Error, command: [ /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:1364 st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:2816 ] Traceback (most recent call last): File "/disk/work/shared/tools/ctat-mutations/ctat_mutations", line 2824, in pipeliner.run() File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/PyLib/Pipeliner.py", line 71, in run cmd.run(checkpoint_dir) File "/disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/PyLib/Pipeliner.py", line 132, in run raise RuntimeError(errmsg) RuntimeError: Error, command: [ /disk/work/shared/tools/ctat-mutations-CTAT-mutations-v2.5.0/src/groom_vcf.py /scratch/es1/ctat/variants.HC_hard_cutoffs_applied.vcf /sc st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:1364 st: file:/disk/work/shared/tools/ctat-mutations/ctat_mutations, lineno:2816 ]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/80, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABH6TH6STQFO3USUX4EHO7DS7TQ5NANCNFSM4XZ73QOQ .

esebesty commented 3 years ago

@brianjohnhaas Sorry, I can't send the vcf, it contains confidential patient genetic data. Maybe we can test public datasets, and I'll send you a file, if we see the error again.

brianjohnhaas commented 3 years ago

No worries. Joshua should have pointed to the fix, so hopefully that works for you.

best,

~brian

On Fri, Feb 19, 2021 at 3:02 AM Endre Sebestyén notifications@github.com wrote:

@brianjohnhaas https://github.com/brianjohnhaas Sorry, I can't send the vcf, it contains confidential patient genetic data. Maybe we can test public datasets, and I'll send you a file, if we see the error again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/80#issuecomment-781905582, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX5LP2J3IDWXQZ3QF53S7YLJPANCNFSM4XZ73QOQ .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

esebesty commented 3 years ago

Replaced the script, thanks!