aertslab / SCENIC

SCENIC is an R package to infer Gene Regulatory Networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
412 stars 94 forks source link

Expected partition of type `DataFrame` but got `NoneType` #110

Closed anlin00007 closed 2 years ago

anlin00007 commented 4 years ago

Hello,

I have been trying SCENIC on our data and got the error message mentioned in the title. The dataset I am using is a 22057 17057 matrix processed by scanpy package. If I use the whole matrix then it give me the error, however, if I only use part of the matrix (1000017057), then it can be finished. At first I thought it is memory issue, but I am using a cluster with 128Gb memory and SCENIC only takes 32Gb before it shows the error. Thus, I have two questions here:

  1. Please advice me on how to solve this error
  2. If I chop the matrix into pieces and run GRNboost on each piece to get regulon and then take the union and run aucell. Does it equal to run GRNboost on whole matrix and then apply resulted regulon to aucell?

The detailed code and error message is shown below.

Thanks

data_expr_all = pd.DataFrame(adata.X.toarray(), index=adata.obs.index, columns=adata.var.index)
adjacencies = grnboost2(data_expr_all, tf_names=tf_names, verbose=True)
 preparing dask client
parsing input
creating dask graph
/usr/local/lib/python3.6/dist-packages/arboreto/algo.py:214: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  expression_matrix = expression_data.as_matrix()
4 partitions
computing dask graph
shutting down client and local cluster
finished
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-33af1126a435> in <module>
      1 data_expr_all = pd.DataFrame(adata.X.toarray(), index=adata.obs.index, columns=adata.var.index)
----> 2 adjacencies = grnboost2(data_expr_all, tf_names=tf_names, verbose=True)
      3 modules = list(modules_from_adjacencies(adjacencies, data_expr_all))
      4 # Calculate a list of enriched motifs and the corresponding target genes for all modules.
      5 with ProgressBar():

/usr/local/lib/python3.6/dist-packages/arboreto/algo.py in grnboost2(expression_data, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
     39     return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS,
     40                gene_names=gene_names, tf_names=tf_names, client_or_address=client_or_address,
---> 41                early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose)
     42 
     43 

/usr/local/lib/python3.6/dist-packages/arboreto/algo.py in diy(expression_data, regressor_type, regressor_kwargs, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
    133 
    134         return client \
--> 135             .compute(graph, sync=True) \
    136             .sort_values(by='importance', ascending=False)
    137 

/usr/local/lib/python3.6/dist-packages/distributed/client.py in compute(self, collections, sync, optimize_graph, workers, allow_other_workers, resources, retries, priority, fifo_timeout, actors, **kwargs)
   2756 
   2757         if sync:
-> 2758             result = self.gather(futures)
   2759         else:
   2760             result = futures

/usr/local/lib/python3.6/dist-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct, asynchronous)
   1820                 direct=direct,
   1821                 local_worker=local_worker,
-> 1822                 asynchronous=asynchronous,
   1823             )
   1824 

/usr/local/lib/python3.6/dist-packages/distributed/client.py in sync(self, func, *args, **kwargs)
    751             return future
    752         else:
--> 753             return sync(self.loop, func, *args, **kwargs)
    754 
    755     def __repr__(self):

/usr/local/lib/python3.6/dist-packages/distributed/utils.py in sync(loop, func, *args, **kwargs)
    329             e.wait(10)
    330     if error[0]:
--> 331         six.reraise(*error[0])
    332     else:
    333         return result[0]

~/.local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

/usr/local/lib/python3.6/dist-packages/distributed/utils.py in f()
    314             if timeout is not None:
    315                 future = gen.with_timeout(timedelta(seconds=timeout), future)
--> 316             result[0] = yield future
    317         except Exception as exc:
    318             error[0] = sys.exc_info()

~/.local/lib/python3.6/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/.local/lib/python3.6/site-packages/tornado/gen.py in run(self)
    740                     if exc_info is not None:
    741                         try:
--> 742                             yielded = self.gen.throw(*exc_info)  # type: ignore
    743                         finally:
    744                             # Break up a reference to itself

/usr/local/lib/python3.6/dist-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1651                             six.reraise(CancelledError, CancelledError(key), None)
   1652                         else:
-> 1653                             six.reraise(type(exception), exception, traceback)
   1654                     if errors == "skip":
   1655                         bad_keys.add(key)

~/.local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

/usr/local/lib/python3.6/dist-packages/dask/dataframe/utils.py in check_meta()
    519     raise ValueError("Metadata mismatch found%s.\n\n"
    520                      "%s" % ((" in `%s`" % funcname if funcname else ""),
--> 521                              errmsg))
    522 
    523 

ValueError: Metadata mismatch found in `from_delayed`.

Expected partition of type `DataFrame` but got `NoneType`

distributed.nanny - WARNING - Worker process still alive after 3 seconds, killing
distributed.nanny - WARNING - Worker process still alive after 3 seconds, killing
distributed.nanny - WARNING - Worker process 3142 was killed by unknown signal
distributed.nanny - WARNING - Worker process 3145 was killed by unknown signal
distributed.nanny - WARNING - Worker process still alive after 3 seconds, killing
distributed.nanny - WARNING - Worker process 3147 was killed by unknown signal
eleozzr commented 4 years ago

Hi @anlin00007 I have the same issues as you, did you fix it?

eleozzr commented 4 years ago

Hope this can help you.