Open fengweimin-maker opened 3 years ago
use the test.genes_vs_motifs.rankings.feather
.
Did you add this option when creating the gene database (it will strip of #some_number
from the region names in your fasta file and should match the gene IDs you provide to pySCENIC)?
-g "#[0-9]+$"
import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd
motifs_vs_genes_ctx_db = 'test.motifs_vs_genes.rankings.feather'
gene_names_df = pf.read_feather(motifs_vs_genes_ctx_db, columns=['genes'])
print(gene_names_df)
genes_vs_motifs_ctx_db = 'test.genes_vs_motifs.rankings.feather'
genes_vs_motifs_ctx_df = pf.read_feather(genes_vs_motifs_ctx_db)
print(genes_vs_motifs_ctx_df)
it very nice of you replied me so soon,but I have no test.genes_vs_regions.rankings.feather
,I got the file
test.cross_species.motifs_vs_regions.rankings.feather
test.cross_species.regions_vs_motifs.rankings.feather
test.motifs_vs_regions.rankings.feather
test.motifs_vs_regions.scores.feather
test.regions_vs_motifs.rankings.feather
test.regions_vs_motifs.scores.feather
May be I have something wrong for building the rank database,I ran your script
import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd
genes_vs_regions_ctx_db = 'test.motifs_vs_regions.rankings.feather'
gene_names_df = pf.read_feather(genes_vs_regions_ctx_db , columns=['genes'])
print(gene_names_df)
but it came out the error:
There are no columns=['genes'] in my test.motifs_vs_regions.rankings.feather
data.
Then, I run:
import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd
genes_vs_regions_ctx_db = 'test.motifs_vs_regions.rankings.feather'
gene_names_df = pf.read_feather(genes_vs_regions_ctx_db)
print(gene_names_df)
At last it came out:
There are 'regions' in my file column. I have no idea the option that you mean, can you get some advance for me to build Axolotl cisTarget database.Thank you very much!
Start in a new dir (or move/delete the current feather files) and create a gene rankings database:
fasta_filename=
motifs_dir=
motifs_list_filename=
db_prefix=
nbr_threads=22
# Create gene rankings database.
"${create_cistarget_databases_dir}/create_cistarget_motif_databases.py" \
-f "${fasta_filename}" \
-M "${motifs_dir}" \
-m "${motifs_list_filename}" \
-g "#[0-9]+$" \
-o "${db_prefix}" \
-t "${nbr_threads}"
It so nice of you reply me quick and your advance help me a lot.
My test_build_database is running now and maybe need some time for giving out the gene rankings database .But I have another question is that when I run pyscenic ctx
,whether it need to make the format like gene_id#1
of the rownames of count data since the database gene name with the format of gene_id#1
? Just like the following:
Thank you very much!
as long as the "gene" names match in the rankings database and the expression matrix, it should work.
ok,Thank you for your advance. According to your guidance,after adding the option of -g "#[0-9]+$" \
I got feather files:
the test.genes_vs_motifs.rankings.feather
format:
and thetest.motifs_vs_genes.rankings.feather
format:
Can I use test.genes_vs_motifs.rankings.feather, test.motifs_vs_genes.rankings.feather
? If I can,which one should I use as the input database?
Also,I don't know why I can't get the test.genes_vs_regions.rankings.feather?Are there something wrong in my script? my script :
Thank you
You can only use test.genes_vs_motifs.rankings.feather
. All other feather files can be deleted, they are needed to create test.genes_vs_motifs.rankings.feather
.
test.motifs_vs_genes.scores.feather
: Cluster-Buster creates scores for regions/genes in your FASTA file per motiftest.motifs_vs_genes.scores.feather
==> test.motifs_vs_genes.rankings.feather
: Scores for regions/genes per motif are converted to a rankingtest.motifs_vs_genes.scores.feather
==> test.genes_vs_motifs.rankings.feather
: for pySCENIC we need rankings for each motif per region/gene (transposed test.motifs_vs_genes.rankings.feather
).I have tested only use test.genes_vs_motifs.rankings.feather
,.But it also gave an error of No columns to parse from file
when I ran pyscenic ctx
.
May be the colnames in my is not correct? But I think all the feather file/motif2TF file/scRNA matrix gene names are the same format(gene id),why it didn't match ? feather file format:
motif2TF file format: ( replaced the human gene symbols by homologous gene for Axolotl gene id,if it can't be replaced, it will be retian human gene symbols):
scRNA matrix format:
Another questions is that you told me I need to use test.genes_vs_regions.rankings.feather
and I need to create a gene rankings database. But now I got test.genes_vs_motifs.rankings.feather
after your instruction, and I need to only use test.genes_vs_motifs.rankings.feather
, I am confused that whether both the test.genes_vs_regions.rankings.feather
and test.genes_vs_motifs.rankings.feather
are the same? Motifs also means regions?
So kind of you reply me a lot, Thank you!
Instead of test.genes_vs_regions.rankings.feather
, it should have been test.genes_vs_motifs.rankings.feather
, my bad.
ok,it‘s doesn't matter. When I ran pyscenic ctx
,the input data format just as the previous said, but it also came out an error:
Why the Signatures dataframe is empty
? May be your advance can help me a lot, and thank you for you reply!
How does your signatures file look like?
I‘m sorry I don't konw which is the signatures file. Now I need to try to run it again and learn from the tutorial. If it also comes error,I will get in touch with you.Thank you for reply me so quick.
ok,it‘s doesn't matter. When I ran
pyscenic ctx
,the input data format just as the previous said, but it also came out an error:Why the
Signatures dataframe is empty
? May be your advance can help me a lot, and thank you for you reply!
Hi, do you solve this problem? I face the same problem. I will be very appreciate you if you have some solution. Thanks. Lee
Sorry,I didn't solve the problem. if you have the solution,please tell me,Thanks
Hi,@authors: I built the Axolotl cisTarget database just following the instruction of Gallus gallus cisTarget database. First, I got the Axolotl gene 10kb up- and 10kb downstream of the TSS fasta file:
Second, I got a set of motifs from wget http://jaspar.genereg.net/download/CORE/JASPAR2020_CORE_vertebrates_non-redundant_pfms_jaspar.txt and replaced the homologous gene for Axolotl gene id,and final changed it for motifs_cb_format.
Luckly, I got the feather file as the follow:
Third, I built the motif2tf database by loading the human motif2TF file, and replaced the human gene symbols by homologous genes from my species,but I don't know which feather file can I use to run the TF and I put all the feather file as database reference, and then I run:
Finally,it came out the motifs_vs_regions.adjacencies.tsv file and got error:
Also , I got many WARNING mesage like that : 2021-11-09 15:51:53,032 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for AMEX60DD011386 could be mapped to test.cross_species.regions_vs_motifs.rankings. Skipping this module.
I have no idea for that, I will be appreciated it if you can reply me soon, Thank you all! Winnie