colomemaria / epiScanpy

Episcanpy: Epigenomics Single Cell Analysis in Python
BSD 3-Clause "New" or "Revised" License
139 stars 33 forks source link

Meet UnicodeDecodeError when utilizing gtf file to produce a gene activity matrix #75

Closed ddb-qiwang closed 3 years ago

ddb-qiwang commented 3 years ago

Hello! When I use function 'episcanpy.tl.geneactivity' to collapse the peak matrix to a "gene activity matrix", I met ‘UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte’ at 'with open(gtf_file) as f: for line in f:' The gtf file I use is gzip file 'hg38', acquired from 'genome.ucsc.edu'.

xuebaliang commented 3 years ago

I meet the same problem. Can the authors share their "gtf" file suitable for the "episcanpy.tl.geneactivity"?

DaneseAnna commented 3 years ago

Hi,

Very sorry for this delayed answer! So far we have been using gif files from gencode so I don't really know where this error is coming from. But I am looking into it.

In the meantime, here is the mouse genome annotation I have been using: https://www.dropbox.com/s/2h1naiiigocacs2/gencode.vM23.primary_assembly.annotation.gtf?dl=0 Here is the human genome annotation I just tested: https://www.dropbox.com/s/jfj5bhqm28t7imb/gencode.v36.annotation.gtf?dl=0

kridsadakorn commented 3 years ago

Hi,

Is this because the data file is compressed? I found the same error in StackOverflow.

https://stackoverflow.com/questions/44659851/unicodedecodeerror-utf-8-codec-cant-decode-byte-0x8b-in-position-1-invalid/44660123