broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
297 stars 54 forks source link

Failed to generate HTML report: `UnicodeDecodeError: 'ascii' codec can't decode byte` #273

Open asherkhb-ktx opened 1 year ago

asherkhb-ktx commented 1 year ago

Running Cellbender 0.3.0 it reports "Unable to create report" with the following Traceback,

cellbender:remove-background: Unable to create report.                                                                                                                                                                                                                      
cellbender:remove-background: Traceback (most recent call last):                                                                                                                                                                                                            
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/cellbender/remove_background/run.py", line 349, in compute_output_denoised_counts_reports_metrics                                                                                                               
    run_notebook_make_html(                                                                                                                                                                                                                                                 
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/cellbender/remove_background/report.py", line 80, in run_notebook_make_html                                                                                                                                     
    _postprocess_html(                                                                                                                                                                                                                                                      
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/cellbender/remove_background/report.py", line 60, in _postprocess_html                                                                                                                                          
    html = f.read()                                                                                                                                                                                                                                                         
  File "/opt/conda/envs/pytorch/lib/python3.9/encodings/ascii.py", line 26, in decode                                                                                                                                                                                       
    return codecs.ascii_decode(input, self.errors)[0]                                                                                                                                                                                                                       
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 232667: ordinal not in range(128)

"pip install" did yield two Errors, which may be related but didn't seem to interfere elsewhere:

ERROR: nbconvert 6.5.4 has requirement jinja2>=3.0, but you'll have jinja2 2.10.1 which is incompatible.
ERROR: jupyter-events 0.7.0 has requirement jsonschema[format-nongpl]>=4.18.0, but you'll have jsonschema 3.2.0 which is incompatible.
yfarjoun commented 1 year ago

Having the same problem, installed yesterday, using conda for python=3.7 and then pip-install cellbender.

sjfleming commented 1 year ago

Hey Yossi! Huh, I will try to replicate this myself. I probably need to pin a version of something that’s had a recent update. Very likely nbconvert

sjfleming commented 1 year ago

@asherkhb-ktx thanks for reporting this. While I figure out the fix, if you wanted to, you could try to generate the report manually. All you need to do is open the Jupyter notebook in this repository

/cellbender/remove_background/report.ipynb

and manually modify the names of the input and output files and run the notebook. This is how the report gets generated (and it then gets converted to html).

sjfleming commented 1 year ago

Hm, so far I am not able to reproduce this. I am on a Mac, and if I do

(base) $ conda create -n test python=3.7
(base) $ conda activate test
(test) $ pip install cellbender

and then run this on the tiny_raw_feature_bc_matrix.h5ad created by $ python generate_tiny_10x_dataset.py (where generate_tiny_10x_dataset.py is from a clone of cellbender and can be found here), like this

(test) $ cellbender remove-background --input tiny_raw_feature_bc_matrix.h5ad --output test.h5

then I do get an HTML report:

cellbender:remove-background: Succeeded in writing CellRanger format output to file tiny_test.h5
cellbender:remove-background: Succeeded in writing CellRanger format output to file tiny_test_filtered.h5
cellbender:remove-background: Saved output metrics as tiny_test_metrics.csv
[NbConvertApp] Converting notebook tmp.report.ipynb to notebook
[NbConvertApp] Writing 416208 bytes to tmp.report.nbconvert.ipynb
[NbConvertApp] Converting notebook tmp.report.nbconvert.ipynb to html
[NbConvertApp] Writing 991091 bytes to tmp.report.nbconvert.html
cellbender:remove-background: Succeeded in writing report to tiny_test_report.html
cellbender:remove-background: Completed remove-background.
cellbender:remove-background: 2023-09-20 11:04:10

If I run

(test) $ pip list

I see

Package                           Version
--------------------------------- ------------
anndata                           0.8.0
anyio                             3.7.1
appnope                           0.1.3
argon2-cffi                       23.1.0
argon2-cffi-bindings              21.2.0
attrs                             23.1.0
backcall                          0.2.0
beautifulsoup4                    4.12.2
bleach                            6.0.0
cellbender                        0.3.0
certifi                           2022.12.7
cffi                              1.15.1
click                             8.1.7
comm                              0.1.4
cycler                            0.11.0
debugpy                           1.7.0
decorator                         5.1.1
defusedxml                        0.7.1
entrypoints                       0.4
exceptiongroup                    1.1.3
fastjsonschema                    2.18.0
fonttools                         4.38.0
h5py                              3.8.0
idna                              3.4
importlib-metadata                6.7.0
importlib-resources               5.12.0
ipykernel                         6.16.2
ipython                           7.34.0
ipython-genutils                  0.2.0
ipywidgets                        8.1.1
jedi                              0.19.0
Jinja2                            3.1.2
jsonschema                        4.17.3
jupyter                           1.0.0
jupyter_client                    7.4.9
jupyter-console                   6.6.3
jupyter-contrib-core              0.4.2
jupyter-contrib-nbextensions      0.7.0
jupyter_core                      4.12.0
jupyter-highlight-selected-word   0.2.0
jupyter-nbextensions-configurator 0.6.3
jupyter-server                    1.24.0
jupyterlab-pygments               0.2.2
jupyterlab-widgets                3.0.9
kiwisolver                        1.4.5
llvmlite                          0.39.1
loompy                            3.0.7
lxml                              4.9.3
MarkupSafe                        2.1.3
matplotlib                        3.5.3
matplotlib-inline                 0.1.6
mistune                           0.8.4
natsort                           8.4.0
nbclassic                         1.0.0
nbclient                          0.7.4
nbconvert                         6.5.4
nbformat                          5.8.0
nest-asyncio                      1.5.8
notebook                          6.5.6
notebook_shim                     0.2.3
numba                             0.56.4
numexpr                           2.8.6
numpy                             1.21.6
numpy-groupies                    0.9.22
opt-einsum                        3.3.0
packaging                         23.1
pandas                            1.3.5
pandocfilters                     1.5.0
parso                             0.8.3
pexpect                           4.8.0
pickleshare                       0.7.5
Pillow                            9.5.0
pip                               22.3.1
pkgutil_resolve_name              1.3.10
prometheus-client                 0.17.1
prompt-toolkit                    3.0.39
psutil                            5.9.5
ptyprocess                        0.7.0
pycparser                         2.21
Pygments                          2.16.1
pyparsing                         3.1.1
pyro-api                          0.1.2
pyro-ppl                          1.8.6
pyrsistent                        0.19.3
python-dateutil                   2.8.2
pytz                              2023.3.post1
PyYAML                            6.0.1
pyzmq                             24.0.1
qtconsole                         5.4.4
QtPy                              2.4.0
scipy                             1.7.3
Send2Trash                        1.8.2
setuptools                        65.6.3
six                               1.16.0
sniffio                           1.3.0
soupsieve                         2.4.1
tables                            3.7.0
terminado                         0.17.1
tinycss2                          1.2.1
torch                             1.13.1
tornado                           6.2
tqdm                              4.66.1
traitlets                         5.9.0
typing_extensions                 4.7.1
wcwidth                           0.2.6
webencodings                      0.5.1
websocket-client                  1.6.1
wheel                             0.38.4
widgetsnbextension                4.0.9
zipp                              3.15.0

What do you see?

sjfleming commented 1 year ago

( Note to self: might be some unexpected non-ascii characters in the html report (possibly due to input filename? or gene names?)... and maybe consider this https://stackoverflow.com/questions/27243129/how-to-open-html-file-that-contains-unicode-characters

instead of this https://github.com/broadinstitute/CellBender/blob/4990df713f296256577c92cab3314daeeca0f3d7/cellbender/remove_background/report.py#L59-L60 )

ZhangMH2000 commented 1 year ago

@asherkhb-ktx thanks for reporting this. While I figure out the fix, if you wanted to, you could try to generate the report manually. All you need to do is open the Jupyter notebook in this repository

/cellbender/remove_background/report.ipynb

and manually modify the names of the input and output files and run the notebook. This is how the report gets generated (and it then gets converted to html).

Hi sjfleming. I also encountered this issue, which showed a warning or an error:

 cellbender:remove-background: Unable to create report.
cellbender:remove-background: Traceback (most recent call last):
  File "/home/zhangminghe/project/miniconda3/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/run.py", line 351, in compute_output_denoised_counts_reports_metrics
    output=html_report_file,
  File "/home/zhangminghe/project/miniconda3/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/report.py", line 82, in run_notebook_make_html
    title=('CellBender: ' + os.path.basename(output).replace('_report.html', '')),
  File "/home/zhangminghe/project/miniconda3/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/report.py", line 60, in _postprocess_html
    html = f.read()
  File "/home/zhangminghe/project/miniconda3/envs/cellbender/lib/python3.7/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 232667: ordinal not in range(128)

However, I have a file named ‘tHCC15_cellbender_output_report.html’ in the output. This HTML file also contains information about the process. So, does this warning merely indicate that some characters cannot be encoded correctly, without affecting the overall report output? Thank you!

sjfleming commented 1 year ago

Hi @ZhangMH2000 , yes, if you have tHCC15_cellbender_output_report.html as an output file, then the contents of that file are fine. The step that is failing (for you, and maybe for some other people) is a superficial sort of a step to rename the title of the HTML report so that it looks like this when you open it: image

When the error you're seeing occurs, the only difference will be that you will probably see a different title when you look at the label on the tab.

I will fix this eventually though! @ZhangMH2000 do you know if you have non-ascii characters in the name of your file?

sjfleming commented 1 year ago

Hm, or maybe the non-ascii characters could be part of feature names, I suppose

ZhangMH2000 commented 1 year ago

Hi @sjfleming , the title of tHCC15_cellbender_output_report.html istmp.report.nbconvert. I don't see any non-ascii characters in the name of my input file. Other reports have correct names, such as Cellbender:tHCC16_cellbender_output. Thank you!

sjfleming commented 1 year ago

Thanks @ZhangMH2000 , good to know