aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

there are only three feather while need four #29

Closed shangguandong1996 closed 1 year ago

shangguandong1996 commented 1 year ago

Hi,

I am running create_cisTarget_database and my program do not report any error.

sgd@localhost ~/newReference/annoation/Athaliana/motif/cisTarget_Ath
$ python /data5/sgd_data/biosoft/create_cisTarget_databases/create_cistarget_motif_databases.py -f Ath_promoter5k.fa -M motif_cb_format -m Ath_TF.txt -o Athaliana -t 100
Scoring 661 motifs with Cluster-Buster took: 362.439386 seconds

Writing cisTarget regions vs motifs scores db: "Athaliana.motifs_vs_regions.scores.feather"
Writing cisTarget regions vs motifs scores db took: 0.443401 seconds

Writing cisTarget motifs vs regions scores db: "Athaliana.regions_vs_motifs.scores.feather"
Writing cisTarget motifs vs regions scores db took: 5.988715 seconds

Create rankings from "Athaliana.motifs_vs_regions.scores.feather" with random seed set to 10154546356483985765.
Creating cisTarget rankings db from cisTarget scores db took: 3.838675 seconds

Writing cisTarget motifs vs regions rankings db: "Athaliana.regions_vs_motifs.rankings.feather"
Writing cisTarget motifs vs regions rankings db took: 3.032129 seconds

(create_cistarget_databases) 

After running, it report three files

sgd@localhost ~/newReference/annoation/Athaliana/motif/cisTarget_Ath
$ ll
total 548M
-rw-rw-r-- 1 sgd sgd  47M Oct  9  2022 Athaliana.motifs_vs_regions.scores.feather
-rw-rw-r-- 1 sgd sgd  67M Oct  9  2022 Athaliana.regions_vs_motifs.rankings.feather
-rw-rw-r-- 1 sgd sgd  73M Oct  9  2022 Athaliana.regions_vs_motifs.scores.feather
-rw-r--r-- 1 sgd sgd 361M Oct  9 17:07 Ath_promoter5k.fa
-rw-rw-r-- 1 sgd sgd  11K Oct  9 18:16 Ath_TF.txt
-rw-rw-r-- 1 sgd sgd  864 Oct  9  2022 convert.py
-rw-r--r-- 1 sgd sgd 169K Oct  9 16:27 JASPAR_plantTFDB_Ath.jaspar
-rw-r--r-- 1 sgd sgd  44K Oct  9 16:25 JASPAR_plantTFDB_Ath.tbl
drwxrwxr-x 2 sgd sgd  663 Oct  9 16:29 motif_cb_format

But according to the protocol, it should be four files

So I am wondering whether you can give me some advice :)

Here is my related files

sgd@localhost ~/newReference/annoation/Athaliana/motif/cisTarget_Ath
$ head Ath_promoter5k.fa 
>AT1G01010
CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAATCTTTAAATCCTACATCCATG
AATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTTCTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTAT
CGTTTTTATGTAATTGCTTATTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT
GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAAGCTTTGCTACGATCTACATT
TGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTTATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTT
GGACATTTATTGTCATTCTTACTCCTTTGTGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTG
TAGGGATGAAGTCTTTCTTCGTTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGGGATGGTCCTTTAGCATTTAT
TCTGAAGTTCTTCTGCTTGATGATTTTATCCTTAGCCAAAAGGATTGGTGGTTTGAAGACACATCATATCAAAAAAGCTA
TCGCCTCGACGATGCTCTATTTCTATCCTTGTAGCACACATTTTGGCACTCAAAAAAGTATTTTTAGATGTTTGTTTTGC

sgd@localhost ~/newReference/annoation/Athaliana/motif/cisTarget_Ath
$ head Ath_TF.txt 
MA0001.2_AGL3
MA0005.2_AG
MA0008.3_HAT5
MA0110.3_ATHB-5
MA0121.1_ARR10
MA0548.2_AGL15
MA0549.1_BZR2
MA0550.2_BZR1
MA0551.1_HY5
MA0552.1_PIF1

sgd@localhost ~/newReference/annoation/Athaliana/motif/cisTarget_Ath
$ ls motif_cb_format/* | head
motif_cb_format/MA0001.2_AGL3.cb
motif_cb_format/MA0005.2_AG.cb
motif_cb_format/MA0008.3_HAT5.cb
motif_cb_format/MA0110.3_ATHB-5.cb
motif_cb_format/MA0121.1_ARR10.cb
motif_cb_format/MA0548.2_AGL15.cb
motif_cb_format/MA0549.1_BZR2.cb
motif_cb_format/MA0550.2_BZR1.cb
motif_cb_format/MA0551.1_HY5.cb
motif_cb_format/MA0552.1_PIF1.cb
shangguandong1996 commented 1 year ago

according to the https://github.com/aertslab/create_cisTarget_databases/commit/c589c1e5564f8b7f7c931e1d7b5e2e79bfbc0e5f

it seems that

Skip writing cisTarget rankings database in motifs or tracks vs regions or genes format as it can take a very long time if there are a lot of regions.

shangguandong1996 commented 1 year ago

But if I only have three feather, it seems that I will meet the error when running pySCENIC:

ValueError: cisTarget database "../related_info/cisTarget_Ath/Athaliana.motifs_vs_genes.scores.feather" has the wrong type. The transposed version is needed.
shangguandong1996 commented 1 year ago

Sorry, it is my mistake. It seems that PySCENIC only need genes_vs_motifs.rankings.feather according to https://github.com/aertslab/create_cisTarget_databases/issues/15#issuecomment-968062262

Lidw2020 commented 1 year ago

Thank you very much for sharing,I successfully run the scripts on Arabidopsis for *.feather file,but, I do not know how to build the motif2TF(JASPAR_plantTFDB_Ath.tbl) database,could you please share the method with me, thank you !