aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
165 stars 27 forks source link

Help with AssertionError: Database "{fname}" doesn't exist. #252

Open Chrisdoan9 opened 8 months ago

Chrisdoan9 commented 8 months ago

Hi all,

I downloaded 3 files required but still got fname doesn't exist error. Would you please have a look? Thank you so much!

Describe the bug AssertionError: Database "{fname}" doesn't exist.

To Reproduce

db_fpath = "/labs/data/"
motif_annot_fpath = "/labs/data/"

rankings_db = os.path.join(db_fpath, 'cluster_SCREEN.regions_vs_motifs.rankings.v2.feather')
scores_db =  os.path.join(db_fpath, 'cluster_SCREEN.regions_vs_motifs.scores.v2.feather')
motif_annotation = os.path.join(motif_annot_fpath, 'motifs-v10-nr.hgnc-m0.00001-o0.0.tbl')

if not os.path.exists(os.path.join(work_dir, 'motifs')):
    os.makedirs(os.path.join(work_dir, 'motifs'))

from scenicplus.wrappers.run_pycistarget import run_pycistarget
run_pycistarget(
    region_sets = region_sets,
    species = 'homo_sapiens',
    save_path = os.path.join(work_dir, 'motifs'),
    ctx_db_path = rankings_db,
    dem_db_path = scores_db,
    path_to_motif_annotations = motif_annotation,
    run_without_promoters = True,
    n_tpu = 1,
    _temp_dir = os.path.join(tmp_dir, 'ray_spill'),
    annotation_version = 'v10nr_clust',
    )

**Error output**
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [59], in <cell line: 2>()
      1 from scenicplus.wrappers.run_pycistarget import run_pycistarget
----> 2 run_pycistarget(
      3     region_sets = region_sets,
      4     species = 'homo_sapiens',
      5     save_path = os.path.join(work_dir, 'motifs'),
      6     ctx_db_path = rankings_db,
      7     dem_db_path = scores_db,
      8     path_to_motif_annotations = motif_annotation,
      9     run_without_promoters = True,
     10     n_tpu = 1,
     11     _temp_dir = os.path.join(tmp_dir, 'ray_spill'),
     12     annotation_version = 'v10nr_clust',
     13     )

File /apps/software/jupyter/python_3.9/lib/python3.9/site-packages/scenicplus/wrappers/run_pycistarget.py:182, in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
    180 ## CISTARGET
    181 regions = region_sets[key]
--> 182 ctx_db = cisTargetDatabase(ctx_db_path, regions)  
    183 if exclude_motifs is not None:
    184     out = pd.read_csv(exclude_motifs, header=None).iloc[:,0].tolist()

File /apps/software/jupyter/python_3.9/lib/python3.9/site-packages/pycistarget/motif_enrichment_cistarget.py:67, in cisTargetDatabase.__init__(self, fname, region_sets, name, fraction_overlap)
     48 def __init__(self, 
     49             fname: str,
     50             region_sets: Union[Dict[str, pr.PyRanges], pr.PyRanges] = None,
     51             name: str = None,
     52             fraction_overlap: float = 0.4):
     53     """
     54     Initialize cisTargetDatabase
     55     
   (...)
     65         Minimal overlap between query and regions in the database for the mapping.     
     66     """
---> 67     self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
     68                                                       region_sets,
     69                                                       name,
     70                                                       fraction_overlap)

File /apps/software/jupyter/python_3.9/lib/python3.9/site-packages/pycistarget/motif_enrichment_cistarget.py:110, in cisTargetDatabase.load_db(self, fname, region_sets, name, fraction_overlap)
    108 if name is None:
    109     name = os.path.basename(fname)
--> 110 db = FeatherRankingDatabase(fname, name=name)
    111 total_regions = db.total_genes
    112 db_regions = db.genes

File /apps/software/jupyter/python_3.9/lib/python3.9/site-packages/ctxcore/rnkdb.py:106, in FeatherRankingDatabase.__init__(self, fname, name)
     98 """
     99 Create a new feather database.
    100 
    101 :param fname: The filename of the database.
    102 :param name: The name of the database.
    103 """
    104 super().__init__(name=name)
--> 106 assert os.path.isfile(fname), """Database "{fname}" doesn't exist."""
    108 self._fname = fname
    109 self.ct_db = CisTargetDatabase.init_ct_db(
    110     ct_db_filename=self._fname, engine="pyarrow"
    111 )

AssertionError: Database "{fname}" doesn't exist.

Expected behavior 2022-08-05 08:53:16,277 pycisTarget_wrapper INFO pbmc_tutorial/motifs folder already exists. 2022-08-05 08:53:17,650 pycisTarget_wrapper INFO Loading cisTarget database for topics_otsu 2022-08-05 08:53:17,653 cisTarget INFO Reading cisTarget database 2022-08-05 09:13:51,198 pycisTarget_wrapper INFO Running cisTarget for topics_otsu

Version (please complete the following information):

ghuls commented 7 months ago

Are you sure the filenames you made, exist? What is the output of:

# File exists?
print(rankings_db, os.path.isfile(rankings_db))
print(scores_db, os.path.isfile(scores_db))
print(motif_annotation, os.path.isfile(motif_annotation))

You can also change /apps/software/jupyter/python_3.9/lib/python3.9/site-packages/ctxcore/rnkdb.py

assert os.path.isfile(fname), """Database "{fname}" doesn't exist."""

To:

assert os.path.isfile(fname), f"""Database "{fname}" doesn't exist."""

So you can see about which filename it complains.

Or install the lastest git version of ctxcore:

pip install 'ctxcore @ git+https://github.com/aertslab/ctxcore'
Chrisdoan9 commented 7 months ago

Hi @ghuls, the file name I download were a little different with the file name in the tutorial which caused the error. Thank you so much! However, I ran into another issue. After a long running time with the cell above, I got kernel death with this in the log file:

Saving file at /project/multiome.ipynb
[I 15:44:24.645 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports

Would you please have a suggestion? I had all the output in the tutorial. Is there anyway I can save everything so if I got error like this, I don't have to run again from the beginning? May I know how much memory I need to run this tutorial?