aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

Clarification of instructions #1

Closed ewitt1093 closed 3 years ago

ewitt1093 commented 3 years ago

Hello, I am working through the instructions to create a custom cistarget database. Here's what files I have: Drosophila melanogaster whole-genome fasta fasta for genes/features of interest seperate bigwig files from encode for the TFs listed in https://resources.aertslab.org/cistarget/track2tf/encode_modERN_20190621__ChIP_seq.drosophila_melanogaster.dm6.track_to_tf_in_motif_to_tf_format.tsv The JASPAR motifs from https://zlab.bu.edu/clover/jaspar2005core cluster-buster output file for JASPAR motifs against Drosophila melanogaster whole-genome FASTA that looks like this: `>2L (23513712 bp)

CLUSTER 1 Location: 20704906 to 20706440 Score: 23.7 MA0073: 6.23 MA0015: 4.03 MA0074: 2.75 MA0052: 1.64 MA0096: 1.21 MA0082: 1.06 MA0043: 1.03 MA0068: 1.02 MA0025: 0.764`

I tried running create_cistarget_databases.py with -M pointing to the directory with the cbust output file, and -m with the path of the cbust output, and I get: Error: Cluster-Buster motif filename "/rugpfs/fs0/zhao_lab/scratch/ewitt/witt/singlecell/clusterbuster/cluster-buster/>2L (23513712 bp).cb" does not exist for motif >2L (23513712 bp). I tried using just the JASPAR motif matrix for -m, and I get Error: Cluster-Buster motif filename ">MA0100 c-MYB_1 TRP-CLUSTER.cb" does not exist for motif >MA0100 c-MYB_1 TRP-CLUSTER.

Can you please help me understand how to properly format the inputs for this process? Thank you very much.

tropfenameimer commented 3 years ago

hi @ewitt1093, can you please post the full command you used? it seems to me that the options got mixed up. the script is looking for a motif called '>2L ...', which is the name of a chromosome.

the option -M should point to a directory with .cb files -M /path/to/motif_dir/

ls /path/to/motif_dir/
jaspar__MA0150.2.cb
jaspar__MA0151.1.cb
jaspar__MA0152.1.cb
jaspar__MA0153.2.cb

and -m to a file containing a list of motif names. e.g. -m motif_names.lst

cat motif_names.lst
jaspar__MA0150.2
jaspar__MA0151.1
jaspar__MA0152.1
jaspar__MA0153.2
ewitt1093 commented 3 years ago

Ah, I see the problem- my output from cbust is a single file. When you run cbust, how do you split the output to a single file for each motif instead of one big file?

ewitt1093 commented 3 years ago

When I ran cbust I used this command: cbust [file with matrices of JASPAR motifs from clusterbuster website] [dmel-all-chromosome-r6.15.fasta] >output.cb

tropfenameimer commented 3 years ago

hi @ewitt1093, you shouldn't have to run cbust yourself. create_cistarget_motif_databases.py does that for you. so the *.cb files contain the motifs in cbust format, not the cbust scores.

ewitt1093 commented 3 years ago

Oh, thanks so much for the clarification! My other question is: which command do I use to build a cistarget database with bigwig files?

tropfenameimer commented 3 years ago

unfortunately, this hasn't been implemented yet.

ewitt1093 commented 3 years ago

All right, I'll calculate motifs on my own first. Thanks for the help!