flatironinstitute / inferelator

Task-based gene regulatory network inference using single-cell or bulk gene expression data conditioned on a prior network.
BSD 2-Clause "Simplified" License
45 stars 12 forks source link

Inferelator workflow for subsetted Seurat .h5ad file #64

Open jsacco1 opened 1 year ago

jsacco1 commented 1 year ago

I have a workflow question. Here is some background:

I have a Seurat RDS file of integrated ATAC-seq and RNA-seq data from human samples. This data measures TF expression. After running PCA and UMAP methods, I end up with numbered clusters, one of which corresponds to a strongly expressed TF (that is, strong enough that a cluster number can be labeled with that TF). I want to run Inferelator 3.0 on cells from that TF cluster. I subsetted the Seurat object first by cluster number, and then by expression level (> 0.5).

Question: I converted this new, smaller Seurat object of ~200 cells into loom file, then into a h5ad file. How do I run Inferelator 3.0 on this .h5ad file?

Also: I have publicly available bulk ATAC-seq data, with which I ran Inferelator Prior to make the priors file. Would it be better to use a experimental ATAC-seq, although it would have lower depth?

asistradition commented 1 year ago

You can read in h5ad files with the following (change paths to match your configuration):

worker = inferelator_workflow(regression='stars', workflow='single-cell')
worker.set_file_paths(
        input_dir='.',
        output_dir='.',
        priors_file='priors.tsv',
        gold_standard_file='gold_standard.tsv'
)
worker.set_expression_file(
    h5ad='data.h5ad',
    h5_layer=None
)

h5_layer=None uses the adata.X array, if you pass an argument it'll use adata.layers[h5_layer] instead.

As for the best ATAC data to use for any project, it's so specific to the data that I don't think I can give you any useful general advice.