aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
420 stars 179 forks source link

Running SCENIC with encode CHIP-seq tracks[results] #244

Closed deevdevil88 closed 3 years ago

deevdevil88 commented 3 years ago

Hello, I wanted to run my SCENIC analysis with additional CHIP-seq based track information from ENCODE for the fruit fly for example. I was wondering, can this be done on the CLI? if so, where do i get the relevant files from ? Do you have any example CHIP-Seq tracks that one can use?

Best, Devika

cflerin commented 3 years ago

Hi @deevdevil88 ,

Yes, this can be done on the CLI in the same way as using the motif-based databases. In the ctx step you would just put in the track database and annotation. You can find track databases on https://resources.aertslab.org/cistarget/ but so far only available for human.

You can also find a tutorial for this here.

Zifeng-L commented 2 years ago

Hi @cflerin , I use the motif-based databases as well as ChIP-seq based track databases in the ctx step. In vsn-pipelines, I found that the results based on these two databases (mtf_auc_mtxtrk_auc_mtx) were combined into one matrix.

    ################################################################################
    # Load the data from the loom and merge if needed
    ################################################################################

    with lp.connect(args.loom_input, mode='r', validate=False) as lf:

        if "RegulonsAUC" in lf.ca.keys():
            auc_mtx = pd.DataFrame(lf.ca.RegulonsAUC, index=lf.ca.CellID)
        else:
            print("Loom with motif & track regulons detected, merging the regulons AUC matrices...")
            mtf_auc_mtx = pd.DataFrame(lf.ca.MotifRegulonsAUC, index=lf.ca.CellID)
            trk_auc_mtx = pd.DataFrame(lf.ca.TrackRegulonsAUC, index=lf.ca.CellID)
            # merge the AUC matrices:
            auc_mtx = pd.concat([mtf_auc_mtx, trk_auc_mtx], sort=False, axis=1, join='outer')
            # fill NAs (if any) with 0s:
            auc_mtx.fillna(0, inplace=True)

But it seems that these two matrices cannot be analyzed and plotted together. Should we just use the mtf_auc_mtx and trk_auc_mtx separately for the following analysis or use the combined matrix? Thanks for your help!