aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
167 stars 27 forks source link

cistarget - Feather V1 and V2 format - rankings and scores #80

Closed Sayyam-Shah closed 1 year ago

Sayyam-Shah commented 1 year ago

Hello!

Thank you for the amazing tool! I'm excited to see the results of scenic plus on our multiome data. I am trying to run cistarget on the ATAC after running cistopic. I am using the cluster_SCREEN.regions_vs_motifs.scores.v2.feather and cluster_SCREEN.regions_vs_motifs.scores.v2.feather files, which I downloaded from the aerts lab resources database.

However, upon running cistarget, I get the below error. I changed the name of the files from the aerts lab database to the name used in the scenic plus tutorial since I recall experiencing a similar issue with pyscenic and the fix was simply a name change. However, I am still getting the same error below in both cases.

ValueError: "/cluster/projects/resources/cluster_SCREEN.regions_vs_motifs.scores.v2.feather" is not a cisTarget Feather database in Feather v1 or v2 format.
2022-12-23 17:37:35,253 cisTarget    INFO     Reading cisTarget database
2022-12-23 17:50:24,013 DEM          INFO     Reading DEM database
Traceback (most recent call last):
  File "cistarget.py", line 53, in <module>
    menr['CTX_'+key+'_All'] = run_cistarget(ctx_db = ctx_db,
  File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/pycistarget/motif_enrichment_dem.py", line 70, in __init__
    fraction_overlap)
  File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/pycistarget/motif_enrichment_dem.py", line 111, in load_db
    db = FeatherRankingDatabase(fname, name=name)
  File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/ctxcore/rnkdb.py", line 110, in __init__
    ct_db_filename=self._fname, engine="pyarrow"
  File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/ctxcore/ctdb.py", line 171, in init_ct_db
    f'"{ct_db_filename}" is not a cisTarget Feather database in Feather v1 or v2 format.'
ValueError: "/cluster/projects/resources/hg38_screen_v10_clust.regions_vs_motifs.scores.feather" is not a cisTarget Feather database in Feather v1 or v2 format.

How may I debug this so I can proceed with the workflow?

Also, I am thinking of generating the eregulons using metacells. Is it possible to export the eregulons and score them in another dataset (e.g. the single cell expression matrix) for downstream analysis?

I am using the below code.

I would update the code to this:

from pycistarget.motif_enrichment_cistarget import *
from pycistarget.motif_enrichment_dem import *
ctx_db_path = '/1.database/ScenicPlus/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather'
dem_db_path='/1.database/ScenicPlus/hg38_screen_v10_clust.regions_vs_motifs.scores.feather'

menr = {}
for key in region_sets.keys():
    regions = region_sets[key]
    ctx_db = cisTargetDatabase(ctx_db_path, regions) 
    dem_db = DEMDatabase(dem_db_path, regions) 
    menr['CTX_'+key+'_All'] = run_cistarget(ctx_db = ctx_db,
        region_sets = regions,
        specie = 'homo_sapiens',
        auc_threshold = 0.005,
        nes_threshold = 3.0,
        rank_threshold = 0.05,
        annotation = ['Direct_annot', 'Orthology_annot'],
        motif_similarity_fdr = 0.000001,
        path_to_motif_annotations = motif_annotation,
        n_cpu = 1,
        _temp_dir= tmp_dir,
        annotation_version = 'v10nr_clust')
    menr['DEM_'+key+'_All'] = DEM(dem_db = dem_db,
    region_sets = regions,
    log2fc_thr = 0.5,
    motif_hit_thr = 3.0,
    max_bg_regions = 500,
    specie = 'homo_sapiens',
    promoter_space = 500,
    motif_annotation =  ['Direct_annot', 'Orthology_annot'],
    motif_similarity_fdr = 0.000001, 
    path_to_motif_annotations = motif_annotation,
    n_cpu = 1,
    annotation_version = 'v10nr_clust',
    tmp_dir = "/ScenicPlus_tutorial/pbmc_tutorial/tmp",
    _temp_dir= tmp_dir)