Build the Axolotl cisTarget database issue

fengweimin-maker commented 3 years ago

Hi,@authors： I built the Axolotl cisTarget database just following the instruction of Gallus gallus cisTarget database. First， I got the Axolotl gene 10kb up- and 10kb downstream of the TSS fasta file：

Second, I got a set of motifs from wget http://jaspar.genereg.net/download/CORE/JASPAR2020_CORE_vertebrates_non-redundant_pfms_jaspar.txt and replaced the homologous gene for Axolotl gene id，and final changed it for motifs_cb_format.

Luckly, I got the feather file as the follow:

Third, I built the motif2tf database by loading the human motif2TF file, and replaced the human gene symbols by homologous genes from my species，but I don't know which feather file can I use to run the TF and I put all the feather file as database reference, and then I run:

Finally，it came out the motifs_vs_regions.adjacencies.tsv file and got error:

Also , I got many WARNING mesage like that : 2021-11-09 15:51:53,032 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for AMEX60DD011386 could be mapped to test.cross_species.regions_vs_motifs.rankings. Skipping this module.

I have no idea for that, I will be appreciated it if you can reply me soon, Thank you all! Winnie

ghuls commented 3 years ago

use the test.genes_vs_motifs.rankings.feather.

Did you add this option when creating the gene database (it will strip of #some_number from the region names in your fasta file and should match the gene IDs you provide to pySCENIC)? -g "#[0-9]+$"

import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd

motifs_vs_genes_ctx_db = 'test.motifs_vs_genes.rankings.feather'

gene_names_df = pf.read_feather(motifs_vs_genes_ctx_db, columns=['genes'])

print(gene_names_df)

genes_vs_motifs_ctx_db = 'test.genes_vs_motifs.rankings.feather'

genes_vs_motifs_ctx_df = pf.read_feather(genes_vs_motifs_ctx_db)

print(genes_vs_motifs_ctx_df)

fengweimin-maker commented 3 years ago

it very nice of you replied me so soon，but I have no test.genes_vs_regions.rankings.feather,I got the file test.cross_species.motifs_vs_regions.rankings.feather test.cross_species.regions_vs_motifs.rankings.feather test.motifs_vs_regions.rankings.feather test.motifs_vs_regions.scores.feather
test.regions_vs_motifs.rankings.feather
test.regions_vs_motifs.scores.feather

May be I have something wrong for building the rank database,I ran your script

import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd

genes_vs_regions_ctx_db = 'test.motifs_vs_regions.rankings.feather'
gene_names_df = pf.read_feather(genes_vs_regions_ctx_db , columns=['genes'])
print(gene_names_df)

but it came out the error: There are no columns=['genes'] in my test.motifs_vs_regions.rankings.feather data.

Then, I run:

import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd

genes_vs_regions_ctx_db = 'test.motifs_vs_regions.rankings.feather'
gene_names_df = pf.read_feather(genes_vs_regions_ctx_db)
print(gene_names_df)

At last it came out:

There are 'regions' in my file column. I have no idea the option that you mean, can you get some advance for me to build Axolotl cisTarget database.Thank you very much!

ghuls commented 3 years ago

Start in a new dir (or move/delete the current feather files) and create a gene rankings database:

fasta_filename=
motifs_dir=
motifs_list_filename=
db_prefix=

nbr_threads=22

# Create gene rankings database.
"${create_cistarget_databases_dir}/create_cistarget_motif_databases.py" \
    -f "${fasta_filename}" \
    -M "${motifs_dir}" \
    -m "${motifs_list_filename}" \
    -g "#[0-9]+$" \
    -o "${db_prefix}" \
    -t "${nbr_threads}"

fengweimin-maker commented 3 years ago

It so nice of you reply me quick and your advance help me a lot. My test_build_database is running now and maybe need some time for giving out the gene rankings database .But I have another question is that when I run pyscenic ctx,whether it need to make the format like gene_id#1 of the rownames of count data since the database gene name with the format of gene_id#1? Just like the following: Thank you very much!

ghuls commented 3 years ago

as long as the "gene" names match in the rankings database and the expression matrix, it should work.

fengweimin-maker commented 3 years ago

ok，Thank you for your advance. According to your guidance，after adding the option of -g "#[0-9]+$" \ I got feather files：

the test.genes_vs_motifs.rankings.feather format:

and thetest.motifs_vs_genes.rankings.feather format:

Can I use test.genes_vs_motifs.rankings.feather, test.motifs_vs_genes.rankings.feather? If I can,which one should I use as the input database?

Also,I don't know why I can't get the test.genes_vs_regions.rankings.feather?Are there something wrong in my script？ my script ：

Thank you

ghuls commented 3 years ago

You can only use test.genes_vs_motifs.rankings.feather. All other feather files can be deleted, they are needed to create test.genes_vs_motifs.rankings.feather.

test.motifs_vs_genes.scores.feather: Cluster-Buster creates scores for regions/genes in your FASTA file per motif
test.motifs_vs_genes.scores.feather ==> test.motifs_vs_genes.rankings.feather: Scores for regions/genes per motif are converted to a ranking
test.motifs_vs_genes.scores.feather ==> test.genes_vs_motifs.rankings.feather: for pySCENIC we need rankings for each motif per region/gene (transposed test.motifs_vs_genes.rankings.feather).

fengweimin-maker commented 3 years ago

I have tested only use test.genes_vs_motifs.rankings.feather,.But it also gave an error of No columns to parse from file when I ran pyscenic ctx.

May be the colnames in my is not correct? But I think all the feather file/motif2TF file/scRNA matrix gene names are the same format(gene id),why it didn't match ? feather file format:

motif2TF file format: ( replaced the human gene symbols by homologous gene for Axolotl gene id,if it can't be replaced, it will be retian human gene symbols):

scRNA matrix format:

Another questions is that you told me I need to use test.genes_vs_regions.rankings.feather and I need to create a gene rankings database. But now I got test.genes_vs_motifs.rankings.feather after your instruction, and I need to only use test.genes_vs_motifs.rankings.feather, I am confused that whether both the test.genes_vs_regions.rankings.feather and test.genes_vs_motifs.rankings.feather are the same? Motifs also means regions?

So kind of you reply me a lot, Thank you!

ghuls commented 3 years ago

Instead of test.genes_vs_regions.rankings.feather, it should have been test.genes_vs_motifs.rankings.feather, my bad.

fengweimin-maker commented 3 years ago

ok，it‘s doesn't matter. When I ran pyscenic ctx,the input data format just as the previous said, but it also came out an error:

Why the Signatures dataframe is empty? May be your advance can help me a lot, and thank you for you reply!

ghuls commented 3 years ago

How does your signatures file look like?

fengweimin-maker commented 3 years ago

I‘m sorry I don't konw which is the signatures file. Now I need to try to run it again and learn from the tutorial. If it also comes error,I will get in touch with you.Thank you for reply me so quick.

frucelee commented 2 years ago

ok，it‘s doesn't matter. When I ran pyscenic ctx,the input data format just as the previous said, but it also came out an error:

Why the Signatures dataframe is empty? May be your advance can help me a lot, and thank you for you reply!

Hi, do you solve this problem? I face the same problem. I will be very appreciate you if you have some solution. Thanks. Lee

fengweimin-maker commented 2 years ago

Sorry,I didn't solve the problem. if you have the solution,please tell me,Thanks

aertslab / create_cisTarget_databases

Build the Axolotl cisTarget database issue #15