aertslab / create_cisTarget_databases

Create cisTarget databases
43 stars 8 forks source link

How to check which genes are in the database? #24

Open linxy29 opened 2 years ago

linxy29 commented 2 years ago

Hi,

I'm using pySCENIC to analyze human iPSC data. We are interested in some genes and have the following questions:

  1. We cannot find TBXT in the pySCENIC. Instead, we found T which is another name of TBXT, can we regard T as TBXT?
  2. We are interested in HOPX and SLIT2. We cannot find the information on these two genes either. I found out another https://github.com/aertslab/create_cisTarget_databases/issues/21. I'm wondering is there any way we can get gene regulatory information about these two genes, or we can't get meaningful information even if we add these two genes to the database?

The database we used are 'hg38refseq-r8010kb_up_and_down_tss.mc9nr.feather', 'hg38refseq-r80500bp_up_and_100bp_down_tss.mc9nr.feather', 'motifs-v9-nr.hgnc-m0.001-o0.0.tbl'.

Thank you for your help.

Best

ghuls commented 2 years ago

You can use:

# cd create_cisTarget_Databases

import feather_v1_or_v2

all_columns_in_ctx_db = get_all_column_names_from_feather(feather_file="hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather')

all_columns_in_ctx_db

Gene names for hg38 are HGNC symbols as linked to RefSeq r80.

linxy29 commented 2 years ago

Hi @ghuls ,

Thank you very much for your help. I'm still having some trouble getting what I want.

1) I tried to enter the 'create_cisTarget_databases' folder and ran the code you posted. I got the error: NameError: name 'get_all_column_names_from_feather' is not defined.

2) Then, I tried to install the create_cisTarget_databases by following the installation guide. I got the error: ld return 1 exit status. I tried several things to debug, but I still failed to install the create_cisTarget_databases module.

1

3) I googled HGNC and RefSeq r80, but I still have no idea whether TBXT, HOPX, and SLIT2 are in the database.

I checked the website 'https://resources.aertslab.org/cistarget/' and found out a tf_lists/allTFs_hg38.txt file。

I'm wondering 1) whether this 'allTFs_hg38.txt' file contains all the genes in the database? Or 2) what should I do to make the 'get_all_column_names_from_feather' function works?

Thank you for your help.

ghuls commented 1 year ago

You don't need to compile Cluster-Buster to be able to check the feather databases. You just need to create a conda environment with the python dependencies and then when you are in this cloned repo, import feather_v1_or_v2.

You can even just load the whole feather database with pandas in the worst case:

import pandas as pd

df = pd.read_feather("hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather')

df.columns
ChenJH-scau commented 1 year ago

Hello, I would like to ask how to obtain the gene_ID of the. feature file on a Linux terminal? I would greatly appreciate it if you could provide some suggestions.