aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
420 stars 179 forks source link

Binarization fails with regulon with few genes #87

Closed dweemx closed 5 years ago

dweemx commented 5 years ago

Hi @bramvds,

I found an extreme case where binarization fails. In my case I had 1 regulon with 1 gene which raised the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-10-af385324e489> in <module>
      5             , title="REW Raw Filtered pySCENIC 100 runs (unweighted)",
      6             nomenclature = "FBgn",
----> 7             compress=True)

~/aertslab/GitHub/pySCENIC/src/pyscenic/export.py in export2loom(ex_mtx, regulons, out_fname, cell_annotations, tree_structure, title, nomenclature, num_workers, embeddings, auc_mtx, auc_thresholds, compress)
     72     # Binarize matrix for AUC thresholds.
     73     if auc_thresholds is None:
---> 74         _, auc_thresholds = binarize(auc_mtx)
     75 
     76     # Create an embedding based on tSNE.

~/aertslab/GitHub/pySCENIC/src/pyscenic/binarization.py in binarize(auc_mtx, threshold_overides)
     73     def derive_thresholds(auc_mtx):
     74         return pd.Series(index=auc_mtx.columns, data=[derive_threshold(auc_mtx, name) for name in tqdm(auc_mtx.columns)])
---> 75     thresholds = derive_thresholds(auc_mtx)
     76     if threshold_overides is not None:
     77         thresholds[list(threshold_overides.keys())] = list(threshold_overides.values())

~/aertslab/GitHub/pySCENIC/src/pyscenic/binarization.py in derive_thresholds(auc_mtx)
     72     print("Binarize the supplied AUC matrix.")
     73     def derive_thresholds(auc_mtx):
---> 74         return pd.Series(index=auc_mtx.columns, data=[derive_threshold(auc_mtx, name) for name in tqdm(auc_mtx.columns)])
     75     thresholds = derive_thresholds(auc_mtx)
     76     if threshold_overides is not None:

~/aertslab/GitHub/pySCENIC/src/pyscenic/binarization.py in <listcomp>(.0)
     72     print("Binarize the supplied AUC matrix.")
     73     def derive_thresholds(auc_mtx):
---> 74         return pd.Series(index=auc_mtx.columns, data=[derive_threshold(auc_mtx, name) for name in tqdm(auc_mtx.columns)])
     75     thresholds = derive_thresholds(auc_mtx)
     76     if threshold_overides is not None:

~/aertslab/GitHub/pySCENIC/src/pyscenic/binarization.py in derive_threshold(auc_mtx, regulon_name, method)
     47             return gmm2.bic(X) <= gmm1.bic(X)
     48 
---> 49     if not isbimodal(data, method):
     50         # For a unimodal distribution the threshold is set as mean plus two standard deviations.
     51         return data.mean() + 2.0*data.std()

~/aertslab/GitHub/pySCENIC/src/pyscenic/binarization.py in isbimodal(data, method)
     39             # Use Hartigan's dip statistic to decide if distribution deviates from unimodality.
     40             _, pval, _ = diptst(np.msort(data))
---> 41             return pval <= 0.05
     42         else:
     43             # Compare Bayesian Information Content of two Gaussian Mixture Models.

TypeError: '<=' not supported between instances of 'NoneType' and 'float'

I haven't tested out what's the minimum genes required for the Hartigan's dip statistic to work but maybe good to let the user know that it needs to filter out regulons with low number of genes?

Cheers, Max

cflerin commented 5 years ago

I'll just point out that in a standard pySCENIC run, the minimum number of genes in a regulon is set to 20 by default. That is, all regulons targeting fewer than 20 genes are dropped (in the modules_from_adjacencies function, min_genes parameter). I think a typical user would most likely be using these defaults, but it's still probably a good idea to handle this error somehow.

dweemx commented 5 years ago

This filter happens at the module level, not at the regulon level right? So still possible to have regulons w/ less than 20 genes no?

cflerin commented 5 years ago

Ah yes, you're right, this filter is on modules, so we do absolutely have regulons with fewer than 20 genes (but I don't believe I've seen one with only 1 gene myself)

dweemx commented 5 years ago

Me neither but I found regulons w/ 4 genes and you still do get this error

bramvds commented 5 years ago

Hi Max,

I'll fix this and make a new release.

Kr, Bram