etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
501 stars 162 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte` #828

Open kamichiotti opened 1 year ago

kamichiotti commented 1 year ago

I have constructed a conda environment following the README instructions, but I am running into a UnicodeDecodeError that I can't seem to get past. Research of the error and review of the full traceback error (below) has indicated it is most likely due to a problem with trying to reading the reference FASTA as a gzip file; however, I have double checked all of my input files, and none of them are gzip'd, including the reference FASTA. Additionally, I constructed a second conda environment based on Python3.8 to determine if it could be a versioning problem, but I get the same error. Deliberately inputting a newly gzip'd reference FASTA also produces this error regardless of the environment used.

Any support you may have on how to remedy this issue is greatly appreciated!


Traceback (most recent call last):
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/bin/cnvkit.py", line 10, in <module>
    sys.exit(main())
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/site-packages/cnvlib/cnvkit.py", line 10, in main
    args.func(args)
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/site-packages/cnvlib/commands.py", line 150, in _cmd_batch
    args.reference, args.targets, args.antitargets = batch.batch_make_reference(
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/site-packages/cnvlib/batch.py", line 94, in batch_make_reference
    access_arr = access.do_access(fasta)
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/site-packages/cnvlib/access.py", line 19, in do_access
    access_regions = GA.from_rows(fa_regions)
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/site-packages/skgenome/gary.py", line 90, in from_rows
    table = pd.DataFrame.from_records(rows, columns=columns)
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/site-packages/pandas/core/frame.py", line 2225, in from_records
    first_row = next(data)
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/site-packages/cnvlib/access.py", line 36, in <genexpr>
    return (tup for tup in region_tups if is_canonical_contig_name(tup[0]))
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/site-packages/cnvlib/access.py", line 43, in get_regions
    for line in infile:
  File "/home/groups/Spellmandata/chiotti/tools/conda_envs/cnvkit/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
etal commented 10 months ago

There are a couple ways this could have happened: