biolab / orange3-single-cell

🍊🔬 Orange add-on for gene expression of single cell data
https://singlecell.biolab.si/
Other
17 stars 25 forks source link

[FIX] OWdotMatrix: handle ordering and several other bugs #354

Closed PrimozGodec closed 5 years ago

PrimozGodec commented 5 years ago
Issue

Widget had bad columns/rows ordering and several other bugs.

Description of changes

Some changes made to the widget:

TODO:

I will add the documentation when we agree with the widget

Includes
PrimozGodec commented 5 years ago

@BlazZupan can you check if the widget works as expected?

pavlin-policar commented 5 years ago

I found a bug:

ScDatasets (select "Pancreatic cells from two experiments") → Select rows (Source ID is Baron2016 Sample) → Dot Matrix

I'm guessing you're trying to find a matrix for each of the values of the discrete "Cluster Variable" but then don't check if any of these are empty. Add a check to make sure there are actually any instances of each "Cluster Variable" value.

Exception: ValueError: The number of observations cannot be determined on an empty distance matrix.
Module: scipy.spatial.distance:2403
Widget Name: Dot Matrix
Widget Module: orangecontrib.single_cell.widgets.owdotmatrix:193
Widget Scheme: /tmp/ows-wbhu6cld.ows.xml
Version: 4.0.0
Environment: Python 3.7.4 on Linux 5.2.9-arch1-1-ARCH #1 SMP PREEMPT Fri Aug 16 11:29:43 UTC 2019 x86_64
Installed Packages: AnyQt==0.0.10, Bottleneck==1.2.1, CacheControl==0.12.5, Genesis-PyAPI==1.2.1, Jinja2==2.10.1, MarkupSafe==1.1.1, MiniSom==2.1.6, Orange3-Bioinformatics==3.5.1.dev85+ge10ce81, Orange3-ImageAnalytics==0.3.1, Orange3-Network==1.4.2, Orange3-Prototypes==0.12.0, Orange3-SingleCell==1.1.0, Orange3-Text==0.8.0.dev0+ffeb7e4, Orange3==3.23.0.dev0+74bb11c, Pillow==6.1.0, PyQt5-sip==4.19.17, PyQt5==5.12.3, PySocks==1.7.0, PyYAML==5.1.2, Pygments==2.4.2, SecretStorage==3.1.1, Send2Trash==1.5.0, Shapely==1.7a2, XlsxWriter==1.1.8, anndata==0.6.22.post1, asn1crypto==0.24.0, atomicwrites==1.3.0, attrs==19.1.0, backcall==0.1.0, beautifulsoup4==4.7.1, biopython==1.73, bleach==3.1.0, boto3==1.9.183, boto==2.49.0, botocore==1.12.183, brotlipy==0.7.0, cellannotation==0.1.0, certifi==2019.6.16, cffi==1.12.3, chardet==3.0.4, commonmark==0.9.0, cryptography==2.7, cycler==0.10.0, decorator==4.4.0, defusedxml==0.6.0, docutils==0.14, docx2txt==0.8, entrypoints==0.3, fastdtw==0.3.2, future==0.17.1, gensim==3.7.3, h2==2.6.2, h5py==2.9.0, hpack==3.0.0, hyperframe==3.2.0, hypertemp==0.8.0, idna==2.8, importlib-metadata==0.18, ipykernel==5.1.1, ipython-genutils==0.2.0, ipython==7.6.1, jedi==0.14.1, jeepney==0.4, jmespath==0.9.4, joblib==0.13.2, json5==0.8.5, jsonschema==3.0.1, jupyter-client==5.3.1, jupyter-core==4.5.0, jupyterlab-server==1.0.0, jupyterlab==1.0.2, keyring==19.0.2, keyrings.alt==3.1.1, kiwisolver==1.1.0, llvmlite==0.29.0, lockfile==0.12.2, loompy==2.0.17, lxml==4.3.4, matplotlib==3.1.1, mistune==0.8.4, more-itertools==7.1.0, msgpack==0.6.1, natsort==6.0.0, nbconvert==5.5.0, nbformat==4.4.0, ndf==0.1.4, networkx==2.3, nltk==3.4.3, notebook==6.0.0, numba==0.44.1, numpy==1.16.4, oauthlib==3.0.1, odfpy==1.4.0, openTSNE==0.3.10, orange-canvas-core==0.1.5.dev0, orange-widget-base==4.0.0, packaging==19.0, pandas==0.24.2, pandocfilters==1.4.2, parso==0.5.1, pdfminer3k==1.3.1, pexpect==4.7.0, pickleshare==0.7.5, pip==19.2.3, plotly==4.0.0, pluggy==0.12.0, ply==3.11, prometheus-client==0.7.1, prompt-toolkit==2.0.9, ptyprocess==0.6.0, py==1.8.0, pyclipper==1.1.0.post1, pycparser==2.19, pynndescent==0.3.0, pyparsing==2.4.0, pyqtgraph==0.10.0, pyrsistent==0.15.3, pytest==5.0.0, python-dateutil==2.8.0, python-louvain==0.13, pytz==2019.1, pyzmq==18.0.2, requests-cache==0.5.2, requests-oauthlib==1.2.0, requests==2.22.0, retrying==1.3.3, rfc3986==1.3.2, s3transfer==0.2.1, scikit-learn==0.21.2, scipy==1.3.0, serverfiles==0.3.0, setuptools-git==1.2, setuptools==41.0.1, simhash==1.9.0, six==1.12.0, slumber==0.7.1, smart-open==1.8.4, soupsieve==1.9.2, terminado==0.8.2, testpath==0.4.2, tornado==6.0.3, traitlets==4.3.2, tweepy==3.7.0, ufal.udpipe==1.2.0.2, urllib3==1.25.3, validate-email==1.3, wcwidth==0.1.7, webencodings==0.5.1, wheel==0.33.4, wikipedia==1.4.0, xlrd==1.2.0, zipp==0.5.1
Machine ID: 172461168621546
Stack Trace: Traceback (most recent call last):  File "/home/pavlin/dev/orange3env/lib/python3.7/site-packages/orange_widget_base-4.0.0-py3.7.egg/orangewidget/gui.py", line 1661, in call    self.func(**kwds)  File "/home/pavlin/dev/orange3-single-cell/orangecontrib/single_cell/widgets/owdotmatrix.py", line 193, in _aggregate_data    self._calculate_table_values()  File "/home/pavlin/dev/orange3-single-cell/orangecontrib/single_cell/widgets/owdotmatrix.py", line 216, in _calculate_table_values    cluster_order, gene_order = self.cluster_data(matrix)  File "/home/pavlin/dev/orange3-single-cell/orangecontrib/single_cell/widgets/owdotmatrix.py", line 232, in cluster_data    cluster = hierarchical.dist_matrix_clustering(rows_distances)  File "/home/pavlin/dev/orange3/Orange/clustering/hierarchical.py", line 130, in dist_matrix_clustering    Z = dist_matrix_linkage(matrix, linkage=linkage)  File "/home/pavlin/dev/orange3/Orange/clustering/hierarchical.py", line 120, in dist_matrix_linkage    return scipy.cluster.hierarchy.linkage(distances, method=linkage)  File "/home/pavlin/dev/orange3env/lib/python3.7/site-packages/scipy-1.3.0-py3.7-linux-x86_64.egg/scipy/cluster/hierarchy.py", line 1064, in linkage    n = int(distance.num_obs_y(y))  File "/home/pavlin/dev/orange3env/lib/python3.7/site-packages/scipy-1.3.0-py3.7-linux-x86_64.egg/scipy/spatial/distance.py", line 2403, in num_obs_y    raise ValueError("The number of observations cannot be determined on "ValueError: The number of observations cannot be determined on an empty distance matrix.
Local Variables: OrderedDict([('Y', array([], dtype=float64)), ('k', 0)])
pavlin-policar commented 5 years ago

Also, and this may not belong in this PR, but would be an improvement, maybe coloring the dots according to their mean expression would make sense. Something like what scanpy does. If nothing else, it would look prettier.

image

Scanpy-s dot plot is slightly different: the size of the dot indicates in what fraction of the cells the gene is experessed in and the color indicates the mean gene expression.

PrimozGodec commented 5 years ago

@pavlin-policar Error is now fixed. Actually the issue was that hierarchical clustering didn't know how to handle the distance for only one row.

PrimozGodec commented 5 years ago

I agree with using color. Maybe like Scanpy, we can decide to describe other information with colors.