UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I was facing this error on WSL

(directml) animeshs@DMED7596:~/ayu$ uname -a
Linux DMED7596 5.4.91-microsoft-standard-WSL2 #1 SMP Mon Jan 25 18:39:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
(directml) animeshs@DMED7596:~/ayu$ bash DeepMicrobes/pipelines/tfrec_predict_kmer.sh  -f fastq/s13._1.fastq -r fastq/s13._2.fastq  -o s13dm -v /home/animeshs/ayu/DeepMicrobes-data/vocabulary/tokens_merged_12mers.txt.gz
...
INFO:tensorflow:Processing test set
INFO:tensorflow:Parsing vocabulary
Traceback (most recent call last):
  File "/home/animeshs/ayu/DeepMicrobes/scripts/seq2tfrec_kmer.py", line 243, in <module>
    main()
  File "/home/animeshs/ayu/DeepMicrobes/scripts/seq2tfrec_kmer.py", line 233, in main
    test_set_convert2tfrecord(input_seq, output_tfrec, kmer, vocab, seq_type)
  File "/home/animeshs/ayu/DeepMicrobes/scripts/seq2tfrec_kmer.py", line 144, in test_set_convert2tfrecord
    word_to_dic = vocab_dict(vocab)
  File "/home/animeshs/ayu/DeepMicrobes/scripts/seq2tfrec_kmer.py", line 37, in vocab_dict
    for line in handle:
  File "/home/animeshs/miniconda3/envs/directml/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
cat: 'subset*.tfrec': No such file or directory
rm: cannot remove 'subset*.tfrec': No such file or directory
Finished.

Workaround is to gunzip the .gz file like the DeepMicrobes-data/vocabulary/tokens_merged_12mers.txt.gz supplied as vocalbulary/-v and (directml) animeshs@DMED7596:~/ayu$ bash DeepMicrobes/pipelines/tfrec_predict_kmer.sh -f fastq/s13._1.fastq -r fastq/s13._2.fastq -o s13dm -v /home/animeshs/ayu/DeepMicrobes-data/vocabulary/tokens_merged_12mers.txt worked 👍🏼

MicrobeLab / DeepMicrobes

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte #11