from scenicplus.wrappers.run_pycistarget import run_pycistarget doesn't work

SteveTur commented 9 months ago

Hi it looks like:

from scenicplus.wrappers.run_pycistarget import run_pycistarget

doesn't work anymore. Here the error I have:

OperationalError                          Traceback (most recent call last)
Untitled-2.ipynb Cell 16 line 1
----> [1](vscode-notebook-cell:Untitled-2.ipynb?jupyter-notebook#X53sdW50aXRsZWQ%3D?line=0) from scenicplus.wrappers.run_pycistarget import run_pycistarget
      [2](vscode-notebook-cell:Untitled-2.ipynb?jupyter-notebook#X53sdW50aXRsZWQ%3D?line=1) run_pycistarget(
      [3](vscode-notebook-cell:Untitled-2.ipynb?jupyter-notebook#X53sdW50aXRsZWQ%3D?line=2)     region_sets = region_sets,
      [4](vscode-notebook-cell:Untitled-2.ipynb?jupyter-notebook#X53sdW50aXRsZWQ%3D?line=3)     species = 'homo_sapiens',
   (...)
     [12](vscode-notebook-cell:Untitled-2.ipynb?jupyter-notebook#X53sdW50aXRsZWQ%3D?line=11)     annotation_version = 'v10nr_clust',
     [13](vscode-notebook-cell:Untitled-2.ipynb?jupyter-notebook#X53sdW50aXRsZWQ%3D?line=12)     )

File [~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:16](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:16)
     [14](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:14) from pycistarget.motif_enrichment_dem import *
     [15](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:15) from pycistarget.utils import *
---> [16](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:16) import pybiomart as pbm
     [17](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:17) import time
     [19](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:19) def run_pycistarget(region_sets: Dict[str, pr.PyRanges],
     [20](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:20)                  species: str,
     [21](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:21)                  save_path: str,
   (...)
     [42](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:42)                  exclude_collection: List[str] = None,
     [43](https://untitled+.vscode-resource.vscode-cdn.net/~/scenicplus/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:43)                  **kwargs):

File [/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/pybiomart/__init__.py:3](https://untitled+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/pybiomart/__init__.py:3)
      [1](https://untitled+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/pybiomart/__init__.py:1) # -*- coding: utf-8 -*-
...
--> [235](https://untitled+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/requests_cache/backends/sqlite.py:235)     self._connection = sqlite3.connect(self.db_path, **self.connection_kwargs)
    [236](https://untitled+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/requests_cache/backends/sqlite.py:236)     # Note: DBAPI doesn't support integer placeholders
    [237](https://untitled+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/requests_cache/backends/sqlite.py:237)     if self.busy_timeout is not None:

OperationalError: unable to open database file

What should I do?

Best,

Steven

SeppeDeWinter commented 9 months ago

Hi @SteveTur

It seems like your error occurs when downloading the genome annotation from biomart. Does the computer you are running this analysis on have access to the internet?

Also you can try to provide a pre-downloaded annotation file using the custom_annot parameter.

The file should look like this:


       Chromosome      Start  Strand     Gene Transcript_type
8053         chrY   22490397       1      PRY  protein_coding
8153         chrY   12662368       1    USP9Y  protein_coding
8155         chrY   12701231       1    USP9Y  protein_coding
8158         chrY   12847045       1    USP9Y  protein_coding
8328         chrY   22096007      -1     PRY2  protein_coding
...           ...        ...     ...      ...             ...
246958       chr1  181483738       1  CACNA1E  protein_coding
246960       chr1  181732466       1  CACNA1E  protein_coding
246962       chr1  181776101       1  CACNA1E  protein_coding
246963       chr1  181793668       1  CACNA1E  protein_coding
246965       chr1  203305519       1     BTG2  protein_coding

[78812 rows x 5 columns]

I hope this helps.

All the best,

Seppe

SteveTur commented 9 months ago

Hi Seppe,

Yes i think it was a network problem. however, I went through another error later which is the error discuss before on the github:

_AttributeError Traceback (most recent call last) /Users/stur/Teaseq/Scenicplus/PB2/Scenicplus_PB2_Part_2.ipynb Cell 31 line 2 20 except Exception as e: 21 #in case of failure, still save the object 22 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1) ---> 23 raise(e)

/Users/stur/Teaseq/Scenicplus/PB2/Scenicplus_PB2_Part_2.ipynb Cell 31 line 3 1 from scenicplus.wrappers.run_scenicplus import run_scenicplus 2 try: ----> 3 run_scenicplus( 4 scplus_obj = scplus_obj, 5 variable = ['GEX_celltype'], 6 species = 'hsapiens', 7 assembly = 'hg38', 8 tf_file = '/Users/stur/Teaseq/Teaseq_Data_TAL1/PB2/utoronto_human_tfs_v_1.01.txt', 9 save_path = os.path.join(work_dir, 'scenicplus'), 10 biomart_host = biomart_host, 11 upstream = [1000, 150000], 12 downstream = [1000, 150000], 13 calculate_TF_eGRN_correlation = True, 14 calculate_DEGs_DARs = True, 15 export_to_loom_file = True, 16 export_to_UCSC_file = True, ... --> 174 ), columns=cv.get_feature_names(), index=regulons.keys()) 175 regulon_mat = regulon_mat.reindex(columns=feature_names, fill_value=0).T 176 if keep_direct_and_extended_if_not_direct is True:

AttributeError: 'CountVectorizer' object has no attribute 'get_featurenames'

You mentioned an update to the source code. Is this still the case? I am also unsure how to convert the object to open the ScenicPlus analysis on a genome browser.

Thank you for your help.

Best,

Steven

SeppeDeWinter commented 9 months ago

Hi Steven

What version of scikit-learn are you using?

see: https://stackoverflow.com/questions/70640923/countvectorizer-object-has-no-attribute-get-feature-names-out

All the best,

Seppe

SteveTur commented 9 months ago

Hi Seppe,

Here are my version:

scikit-image 0.20.0 scikit-learn 1.2.2 scikit-misc 0.1.4

Best,

Steven

SeppeDeWinter commented 9 months ago

Hi @SteveTur

This should be fixed by this commit: https://github.com/aertslab/scenicplus/commit/e5ba6fcf42459b6e6b70e27359ddd11289d70cc5

All the best,

Seppe

SteveTur commented 9 months ago

Hi Seppe,

I managed to troubleshoot it. However I am confuse about which tutorial follow for my own data. I have complete the 10X_PBMC one and got good results but i would like to go deeper in the analysis and get for example the integrated multiome plot from the cerebrelum tutorial. I am not sure to understand what is exactly the genes.gtf file and where do you provide it or do i have to created it myself?

Best,

Steven

SeppeDeWinter commented 9 months ago

Hi @SteveTur

These gtf files are publicly available for almost any species, see for example https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/ for human.

I'm closing this issue.

If you have further questions about downstream analysis feel free to open a new discussion and I will be happy to help.

Good luck with you analysis.

All the best,

Seppe

aertslab / scenicplus

from scenicplus.wrappers.run_pycistarget import run_pycistarget doesn't work #269