aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

Problems encountered during the construction of the Red Eared Turtle database #38

Open zhongguodiyidao opened 1 year ago

zhongguodiyidao commented 1 year ago

Dear author,

image I built the library based on the method mentioned in the library building example you provided. In the end, your code will generate 8 feature files, but I only generated three, image , And I don't know how to use the mouse motifs file that I replaced with homologous genes from turtles image

Looking forward to your reply, thank you!

ghuls commented 1 year ago

The test.regions_vs_motifs.rankings.feather file and motifs-v10nr_clust-nr2.tgi-m0.01-o0.0.tbl you can use with pySCENIC: https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images

Although for pySCENIC is would be better to use gene-based databases (-g option).

zhongguodiyidao commented 1 year ago

First of all, thank you for your reply! I'm so sorry,I've been quite confused lately. The species we are studying is the Brazilian red eared turtle,When using Python for scenic analysis, I prepared several files.Firstly, we used bedtools getfasta to create feature files for upstream and downstream 5kb and upstream 500bp, respectively. , 0625064c51d5b376526d32530e8b98b

Furthermore, as per your previous suggestion, we have modified the gene name of the mouse in the motiftoTF file to that of the turtle, , f2de680fedc2fb06cbf26c131007e81

Finally, we replaced it https://resources.aertslab.org/cistarget/tf_lists/allTFs_mm.txt The gene name of the transfer factor. May I ask if these three files are sufficient? Looking forward to your reply, thank you again!

ghuls commented 1 year ago

Yes, those should be sufficient.

clarkzor commented 8 months ago

Hi Ghuls, I don't think that these files are actually sufficient based on my experience trying to create the database for Xenopus tropicalis.

I have generated regions_vs_motifs.rankings.feather image

To do this, I used image

Where my .fasta was created by image

I put gene names instead of coordinate positions because I am linking the genenames to GeneList in image

Soo, when I upload my regions_vs_motifs.rankings.feather it looks like this image image

However, This is different than your sample data which looks like this image image

As you can see, your demo file has "features" with encode motif id, however, my feather file does NOT have the same information. No where in any documentation do you show the format of the fasta file required to generate such an output from another species, unless I have missed something.

When I further run my .feather file it looks like this: image image

So I get this weird warning, and then when I look at the output it looks like this: image

BUT if I run using your demo data I get: image

So my question is what do I need to add to my create_cistarget_motif_databases.py syntax to be able to actually link my .tbl database (with all lowercase Xenopus tropicalis genes) to my rankings.feather. I don't understand how to go from .regions_vs_motifs.rankings.feather to .genes_vs_motifs.ranking.feather.

Please help, I really really want to be able to use your awesome software on Xenopus single cell data.