aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

Can the input motifs be in the PWM format? #19

Open kerenzhou062 opened 2 years ago

kerenzhou062 commented 2 years ago

Hi, can the input motifs be in the PWM format? Like outputs from HOMER program (motif1.motif), an example please check bellow:

>TGCATG 1-TGCATG,BestGuess:hsa-miR-4262 MIMAT0016894 Homo sapiens miR-4262 Targets (miRBase)(0.647) 5.179177    -34261.033795   0   T:22354.0(48.77%),B:2355.3(5.54%),P:1e-14879
0.001   0.044   0.001   0.954
0.001   0.001   0.997   0.001
0.001   0.997   0.001   0.001
0.997   0.001   0.001   0.001
0.001   0.001   0.001   0.997
0.001   0.001   0.997   0.001

Best,

Keren

ghuls commented 2 years ago

The motifs need to be in Cluster-Buster format.

The following function will create them (put one homer motif per file).

homer_to_clusterbuster () {
    local homer_motif_file="${1}";
    awk -F '\t' -v 'OFS=\t' '{ if ($1 ~ />/) { print $1 } else if (NF == 4) { print $1 * 100, $2 * 100, $3 * 100, $4 * 100; } }' "${homer_motif_file}";
}
$ cat /tmp/motif.homer 
>TGCATG 1-TGCATG,BestGuess:hsa-miR-4262 MIMAT0016894 Homo sapiens miR-4262 Targets (miRBase)(0.647) 5.179177    -34261.033795   0   T:22354.0(48.77%),B:2355.3(5.54%),P:1e-14879
0.001   0.044   0.001   0.954
0.001   0.001   0.997   0.001
0.001   0.997   0.001   0.001
0.997   0.001   0.001   0.001
0.001   0.001   0.001   0.997
0.001   0.001   0.997   0.001

$ homer_to_clusterbuster /tmp/motif.homer 
>TGCATG
0.1 4.4 0.1 95.4
0.1 0.1 99.7    0.1
0.1 99.7    0.1 0.1
99.7    0.1 0.1 0.1
0.1 0.1 0.1 99.7
0.1 0.1 99.7    0.1
kerenzhou062 commented 2 years ago

The motifs need to be in Cluster-Buster format.

The following function will create them (put one homer motif per file).

homer_to_clusterbuster () {
    local homer_motif_file="${1}";
    awk -F '\t' -v 'OFS=\t' '{ if ($1 ~ />/) { print $1 } else if (NF == 4) { print $1 * 100, $2 * 100, $3 * 100, $4 * 100; } }' "${homer_motif_file}";
}
$ cat /tmp/motif.homer 
>TGCATG   1-TGCATG,BestGuess:hsa-miR-4262 MIMAT0016894 Homo sapiens miR-4262 Targets (miRBase)(0.647) 5.179177    -34261.033795   0   T:22354.0(48.77%),B:2355.3(5.54%),P:1e-14879
0.001 0.044   0.001   0.954
0.001 0.001   0.997   0.001
0.001 0.997   0.001   0.001
0.997 0.001   0.001   0.001
0.001 0.001   0.001   0.997
0.001 0.001   0.997   0.001

$ homer_to_clusterbuster /tmp/motif.homer 
>TGCATG
0.1   4.4 0.1 95.4
0.1   0.1 99.7    0.1
0.1   99.7    0.1 0.1
99.7  0.1 0.1 0.1
0.1   0.1 0.1 99.7
0.1   0.1 99.7    0.1

Thank you for your explaination!

Best,

Keren

ghuls commented 1 year ago

Our SCENIC+ public motif collection is now available: https://resources.aertslab.org/cistarget/motif_collections/