aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
433 stars 181 forks source link

[BUG] pyscenic grn ValueError: tuple is not allowed for map key #175

Open lucygarner opened 4 years ago

lucygarner commented 4 years ago

Describe the bug Error when running pyscenic grn. I am using an older version of dask (1.0.0) as previously suggested.

Steps to reproduce the behavior

  1. Command run when the error occurred:

    pyscenic grn -o results/filtered_adjacencies.csv -m grnboost2 --seed 100 --num_workers 24 --cell_id_attribute CellID --gene_attribute Gene data/merged_all_analysed_filtered.loom resources/tfs_list/lambert2018.txt 
  2. Error encountered:

    Traceback (most recent call last):
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/bin/pyscenic", line 8, in <module>
    sys.exit(main())
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 421, in main
    args.func(args)
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 74, in find_adjacencies_command
    network = method(expression_data=ex_mtx, tf_names=tf_names, verbose=True, client_or_address=client, seed=args.seed)
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/arboreto/algo.py", line 39, in grnboost2
    return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS,
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/arboreto/algo.py", line 120, in diy
    graph = create_graph(expression_matrix,
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/arboreto/core.py", line 403, in create_graph
    future_tf_matrix = client.scatter(tf_matrix, broadcast=True)
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/client.py", line 2062, in scatter
    return self.sync(
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/client.py", line 753, in sync
    return sync(self.loop, func, *args, **kwargs)
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/utils.py", line 331, in sync
    six.reraise(*error[0])
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/six.py", line 703, in reraise
    raise value
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/utils.py", line 316, in f
    result[0] = yield future
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/client.py", line 1911, in _scatter
    yield self.scheduler.scatter(
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/core.py", line 739, in send_recv_from_rpc
    result = yield send_recv(comm=comm, op=key, **kwargs)
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/core.py", line 533, in send_recv
    response = yield comm.read(deserializers=deserializers)
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/comm/tcp.py", line 217, in read
    msg = yield from_frames(
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/comm/utils.py", line 85, in from_frames
    res = _from_frames()
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/comm/utils.py", line 70, in _from_frames
    return protocol.loads(
    File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/distributed/protocol/core.py", line 108, in loads
    header = msgpack.loads(header, use_list=False, **msgpack_opts)
    File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb
    ValueError: tuple is not allowed for map key
    /data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
    data = yaml.load(f.read()) or {}

Expected behavior Expected pyscenic grn to produce the output file: filtered_adjacencies.csv

Please complete the following information:

cflerin commented 4 years ago

Have you tried some of the suggestions in #163 ?

lucygarner commented 4 years ago

Thank you - yes I have tried using dask=1.0.0, distributed >=1.21.6, <2.0.0 and pandas 0.25.3, but this did not work.

I also tried to install dask 2.11.0 instead, but I got the following errors:

ERROR: pyscenic 0.10.1 has requirement dask==1.0.0, but you'll have dask 2.11.0 which is incompatible. ERROR: pyscenic 0.10.1 has requirement distributed<2.0.0,>=1.21.6, but you'll have distributed 2.11.0 which is incompatible. ERROR: pyscenic 0.10.1 has requirement pandas<1.0.0,>=0.20.1, but you'll have pandas 1.0.4 which is incompatible.

Which versions would you recommend trying?

lucygarner commented 4 years ago

I am using Conda environments, so if you have a .yml file for a Conda environment where pyscenic is working, I could give that a try

cflerin commented 4 years ago

If none of the dask version tweaks have worked for you, I would then try using the arboreto_with_multiprocessing.py script described in that post.

chansigit commented 4 years ago

I suggest removing dask from pyscenic. It brings much more difficulties than efficiency boosts.

lucygarner commented 4 years ago

Thank you @cflerin, I have tried using the arboreto_with_multiprocessing.py script. I am now getting the following error with both approaches (pyscenic grn or arboreto_with_multiprocessing.py)

2020-06-04 20:37:40,351 - pyscenic.cli.pyscenic - INFO - Writing results to file. preparing dask client parsing input creating dask graph 24 partitions computing dask graph not shutting down client, client was created externally finished Traceback (most recent call last): File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/bin/pyscenic", line 8, in sys.exit(main()) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 421, in main args.func(args) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 80, in find_adjacencies_command extension = PurePath(fname).suffixes NameError: name 'fname' is not defined distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-14, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-19, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-9, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-15, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-20, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-3, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-10, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-21, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-22, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-16, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-6, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-13, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-11, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-12, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-17, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-23, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-4, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-5, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-18, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-7, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-24, started daemon)> distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-8, started daemon)> distributed.nanny - WARNING - Worker process 125161 was killed by unknown signal /data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. data = yaml.load(f.read()) or {}

I am using a Conda environment containing the following packages:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
arboreto                  0.1.5                    pypi_0    pypi
attrs                     19.3.0                   pypi_0    pypi
bokeh                     2.0.1            py37hc8dfbb8_0    conda-forge
boltons                   20.1.0                   pypi_0    pypi
ca-certificates           2020.4.5.1           hecc5488_0    conda-forge
certifi                   2020.4.5.1       py37hc8dfbb8_0    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
cloudpickle               1.4.1                      py_0    conda-forge
cytoolz                   0.10.1           py37h516909a_0    conda-forge
dask                      1.0.0                      py_1    conda-forge
dask-core                 1.0.0                      py_0    conda-forge
decorator                 4.4.2                    pypi_0    pypi
dill                      0.3.1.1                  pypi_0    pypi
distributed               1.28.1                   py37_0    conda-forge
freetype                  2.10.2               he06d7ca_0    conda-forge
frozendict                1.2                      pypi_0    pypi
h5py                      2.10.0                   pypi_0    pypi
heapdict                  1.0.1                      py_0    conda-forge
interlap                  0.2.6                    pypi_0    pypi
jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
joblib                    0.15.1                   pypi_0    pypi
jpeg                      9d                   h516909a_0    conda-forge
ld_impl_linux-64          2.34                 h53a641e_4    conda-forge
libblas                   3.8.0               16_openblas    conda-forge
libcblas                  3.8.0               16_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.5.0                hdf63c60_6    conda-forge
liblapack                 3.8.0               16_openblas    conda-forge
libopenblas               0.3.9                h5ec1e0e_0    conda-forge
libpng                    1.6.37               hed695b0_1    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libtiff                   4.1.0                hc7e4089_6    conda-forge
libwebp-base              1.1.0                h516909a_3    conda-forge
llvm-openmp               10.0.0               hc9558a2_0    conda-forge
llvmlite                  0.32.1                   pypi_0    pypi
locket                    0.2.0                      py_2    conda-forge
loompy                    3.0.6                    pypi_0    pypi
lz4-c                     1.9.2                he1b5a44_1    conda-forge
markupsafe                1.1.1            py37h8f50634_1    conda-forge
msgpack-python            0.6.2            py37hc9558a2_0    conda-forge
multiprocessing-on-dill   3.5.0a4                  pypi_0    pypi
ncurses                   6.1               hf484d3e_1002    conda-forge
networkx                  2.4                      pypi_0    pypi
numba                     0.49.1                   pypi_0    pypi
numpy                     1.18.4           py37h8960a57_0    conda-forge
numpy-groupies            0+unknown                pypi_0    pypi
olefile                   0.46                       py_0    conda-forge
openssl                   1.1.1g               h516909a_0    conda-forge
packaging                 20.4               pyh9f0ad1d_0    conda-forge
pandas                    0.25.3           py37hb3f55d8_0    conda-forge
partd                     1.1.0                      py_0    conda-forge
pillow                    7.1.2            py37h718be6c_0    conda-forge
pip                       20.1.1                     py_1    conda-forge
psutil                    5.7.0            py37h8f50634_1    conda-forge
pyarrow                   0.16.0                   pypi_0    pypi
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyscenic                  0.10.1                   pypi_0    pypi
python                    3.7.6           cpython_h8356626_6    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2020.1             pyh9f0ad1d_0    conda-forge
pyyaml                    5.3.1            py37h8f50634_0    conda-forge
readline                  8.0                  hf8c457e_0    conda-forge
scikit-learn              0.23.1                   pypi_0    pypi
scipy                     1.4.1                    pypi_0    pypi
setuptools                47.1.1           py37hc8dfbb8_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sortedcontainers          2.1.0                      py_0    conda-forge
sqlite                    3.30.1               hcee41ef_0    conda-forge
tbb                       2020.0.133               pypi_0    pypi
tblib                     1.6.0                      py_0    conda-forge
threadpoolctl             2.1.0                    pypi_0    pypi
tk                        8.6.10               hed695b0_0    conda-forge
toolz                     0.10.0                     py_0    conda-forge
tornado                   6.0.4            py37h8f50634_1    conda-forge
tqdm                      4.46.1                   pypi_0    pypi
typing_extensions         3.7.4.2                    py_0    conda-forge
umap-learn                0.4.3                    pypi_0    pypi
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_0    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
zict                      2.0.0                      py_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
zstd                      1.4.4                h6597ccf_3    conda-forge

What is this issue related to?

Best, Lucy

cflerin commented 4 years ago

Hi @lc822 ,

First, this:

File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 80, in find_adjacencies_command
extension = PurePath(fname).suffixes
NameError: name 'fname' is not defined

is a bug in pyscenic grn, which has now been fixed in release 0.10.2.

Second, the output you pasted was not from the arboreto_with_multiprocessing.py script. I think you'll have the most luck with this method, so if you can share the command you're running and error you're getting, it would help a lot.

lucygarner commented 4 years ago

Thank you - I will update to 0.10.2.

This is my command for arboreto_with_multiprocessing.py - can you see anything wrong with this?

python arboreto_with_multiprocessing.py data/merged_all_analysed.loom resources/tfs_list/lambert2018.txt --output results/adjacencies.csv --num_workers 20

cflerin commented 4 years ago

python arboreto_with_multiprocessing.py data/merged_all_analysed.loom resources/tfs_list/lambert2018.txt --output results/adjacencies.csv --num_workers 20

Looks good!

Annika18 commented 4 years ago

Would you mind describing how to get the arboreto_with_multiprocessing script?

How do I download it and where should it be stored on my laptop? I have version 0.10.2, but the script doesn't look like it's in my pyscenic package. Also, once I have it, can I import it in a Jupyter notebook?

lucygarner commented 4 years ago

Hi @Annika18,

I couldn't find the script within the package either, so I just copied it from the GitHub page. It should then be possible to run the script from your Jupyter Notebook.

Best, Lucy

cflerin commented 4 years ago

Hi @Annika18 ,

You can download the script with wget: wget https://raw.githubusercontent.com/aertslab/pySCENIC/master/scripts/arboreto_with_multiprocessing.py wget https://raw.githubusercontent.com/aertslab/pySCENIC/master/src/pyscenic/cli/arboreto_with_multiprocessing.py

You can store it anywhere, but you'll need to have pySCENIC installed to use it. I would recommend running it directly from the command line, then importing the output into a notebook.

Annika18 commented 4 years ago

Thank you. Do you have any advice on picking num_workers? My computer has 4 cores, should I use 4? Why is 20 used in the example in the FAQ?

lucygarner commented 4 years ago

Hi @Annika18,

This is likely because most people will be running pySCENIC on a high performance computing cluster. If you only have 4 cores available, then you should go with that.

saeedfc commented 4 years ago

Hi,

wget https://raw.githubusercontent.com/aertslab/pySCENIC/master/scripts/arboreto_with_multiprocessing.py

I see that this script is no more available at scripts folder.

davisidarta commented 4 years ago

Hello,

Why isn't the arboreto_with_multiprocessing.py available anymore? I had exactly the same issue in 3 different machines (seriously guys, Dask sucks).

Annika18 commented 4 years ago

Hi @davisidarta @saeedfc I reuploaded arboreto_with_multiprocessing.py here: https://github.com/Annika18/arboreto-multi-reupload/blob/master/arboreto_with_multiprocessing.py If you would like to use that to download it. All credits go to aertslab for developing the code but I just want to help out because I also couldn't use the dask implementation.

cflerin commented 4 years ago

Hey @saeedfc , @davisidarta ,

Sorry about that, I moved the script to a different folder because it's now built into the pySCENIC CLI. So if you have pySCENIC 0.10.3 or higher, you don't need to download it anymore, it's available on the path to run directly. New location is here: https://raw.githubusercontent.com/aertslab/pySCENIC/master/src/pyscenic/cli/arboreto_with_multiprocessing.py and I'll edit the post above.

davisidarta commented 4 years ago

Hi @Annika18 @cflerin !

Thank you. The aboreto_with_multiprocessing.py script worked perfectly, although I had to perform downstream analysis with the CLI and deal with the loom file in scanpy (for some reason the notebooks didn't work on my local machine). Yet, I was able to export the results to SCope. You guys rock!