aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

Will zebrafish cisTarget databases be available in the future? #26

Open yjchen1201 opened 1 year ago

yjchen1201 commented 1 year ago

Thank you for developing this amazing tool! I am wondering if zebrafish cisTarget databases will be available in the future, like for human/mouse/drosophila?

ghuls commented 1 year ago

Creating databases for species we don't use in the lab, probably needs to be a community effort as validating that the input regions used to create the database work properly is quite some work.

There was an effort to create a zebrafish database a while back: https://github.com/aertslab/create_cisTarget_databases/issues/8

Recently I was contacted by a group that was trying to make a zebrafish database. Once they feel confident enough that it works properly, they were willing to share it.

Mesi395 commented 1 year ago

For creating a cisTarget database for zebrafish I used:

yjchen1201 commented 1 year ago

Thank you @ghuls @Mesi395 ! I will give it a try.

JoGraesslin commented 1 year ago

As soon as you managed to construct the .feather file, you will have to convert the gene names from JASPAR into the symbol names from your genome. To achieve this, we used several different databases:

  1. Ensembl biomart: First convert the JASPAR gene names to ensembl gene names, then use the orthology databases to convert to similar genes from zebrafish
  2. Alliance database: https://www.alliancegenome.org/downloads#orthology
  3. OMA database: https://omabrowser.org/oma/home/

The output file you want to get should look like the files for mouse/human on the aertslab homepage https://resources.aertslab.org/cistarget/motif2tf/. You could also take the mouse file from the website, and directly convert the mouse gene names to zebrafish gene names. This gave us the highest yield of motifs / TF, however we were not entirely sure on how the aertlab .tbl file was constructed by @ghuls.

yanpinlu commented 1 year ago

Creating databases for species we don't use in the lab, probably needs to be a community effort as validating that the input regions used to create the database work properly is quite some work.

There was an effort to create a zebrafish database a while back: #8

Recently I was contacted by a group that was trying to make a zebrafish database. Once they feel confident enough that it works properly, they were willing to share it.

Hello, I am eager to know the current progress of database construction about zebrafish. Whether it can be used and provide detailed build steps? @ghuls

ghuls commented 1 year ago

@yanpinlu unfortenately I didn't hear anything back from them so far.

In case it helps, our SCENIC+ public motif collection is now public: https://resources.aertslab.org/cistarget/motif_collections/

So at least you don't have to hunt for your own PWM files anymore.

mtrebelo commented 1 year ago

@yjchen1201 Hi! Were you able to create the zebrafish dataset?

willey2020 commented 1 year ago

@yjchen1201 @ghuls similarly asking with @mtrebelo, does that mean, as we already have had the "scenifc+" motif collection/motif2TF, which is very comprehensive, we just need to change the TF gene name of it in the .tbl file to zebrafish format for now as a good usage? Thank you!

@yanpinlu unfortenately I didn't hear anything back from them so far.

In case it helps, our SCENIC+ public motif collection is now public: https://resources.aertslab.org/cistarget/motif_collections/

So at least you don't have to hunt for your own PWM files anymore.

ghuls commented 1 year ago

https://github.com/JoGraesslin/Zebrafish_SCENIC @JoGraesslin provides his scripts at https://github.com/JoGraesslin/Zebrafish_SCENIC.

A motif2tf table file with zebrafish names provided by him can be found at: https://drive.google.com/file/d/1__P8l-XTLA6Bup_ucKs4M1yGqE-wGbYz/view?usp=sharing. It contains the human.tbl file with orthology names from ensembl, alliance and oma databases.