aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

cbust error #13

Closed mcsimenc closed 2 years ago

mcsimenc commented 2 years ago

When I run

create_cistarget_motif_databases.py -f Athaliana.Col-0.HPIv01_10k.promoters.fasta  \
                                    -M motifs/ \
                                    -m motifs_names_list \
                                    -o Athaliana_DAPseq_motifs \
                                    -t 28

it reports

Error: Non-zero exit status for: '/home/msimenc/software/anaconda3/bin/cbust -f 4 -c 0.0 -r 10000 -b 0 -t 1 /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/motifs/AP2EREBP_tnt.ERF104_col_a_m1.cb /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/Athaliana.Col-0.HPIv01_10k.promoters.fasta'

for every motif.

I tried running the cbust command on its own but get:

Sequence should be at least 1 bp long.

However, running this cbust command using -f 0 or -f 1 results in output.

I opened an issue at the cluster buster repository:

https://github.com/weng-lab/cluster-buster/issues/5

Any help would be appreciated!

ghuls commented 2 years ago

Did you create the fasta file with: https://github.com/aertslab/create_cisTarget_databases/blob/master/create_fasta_with_padded_bg_from_bed.sh

Can you post the output of the following 2 commands?

file /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/motifs/AP2EREBP_tnt.ERF104_col_a_m1.cb /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/Athaliana.Col-0.HPIv01_10k.promoters.fasta

head -n30 /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/motifs/AP2EREBP_tnt.ERF104_col_a_m1.cb /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/Athaliana.Col-0.HPIv01_10k.promoters.fasta
mcsimenc commented 2 years ago

Thanks for the reply. I didn't use create_fasta_with_padded_bg_from_bed.sh to make the fasta but I will try it.

Here are the outputs:

(create_cistarget_databases) [msimenc@KIWI create_cisTarget_database]$ file /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/motifs/AP2EREBP_tnt.ERF104_col_a_m1.cb /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/Athaliana.Col-0.HPIv01_10k.promoters.fasta
/home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/motifs/AP2EREBP_tnt.ERF104_col_a_m1.cb:     ASCII text
/home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/Athaliana.Col-0.HPIv01_10k.promoters.fasta: ASCII text
(create_cistarget_databases) [msimenc@KIWI create_cisTarget_database]$ head -n30 /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/motifs/AP2EREBP_tnt.ERF104_col_a_m1.cb /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/Athaliana.Col-0.HPIv01_10k.promoters.fasta
==> /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/motifs/AP2EREBP_tnt.ERF104_col_a_m1.cb <==
>AP2EREBP_tnt.ERF104_col_a_m1
0.271095        0.172352        0.348294        0.208259
0.145422        0.569120        0.104129        0.181329
0.174147        0.554758        0.100539        0.170557
0.296230        0.109515        0.371634        0.222621
0.120287        0.526032        0.100539        0.253142
0.154399        0.651706        0.059246        0.134650
0.224417        0.089767        0.382406        0.303411
0.039497        0.662478        0.136445        0.161580
0.147217        0.833034        0.010772        0.008977
0.064632        0.000000        0.890485        0.044883
0.000000        0.996409        0.003591        0.000000
0.000000        1.000000        0.000000        0.000000
0.048474        0.000000        0.946140        0.005386
0.003591        0.836625        0.000000        0.159785
0.007181        0.992819        0.000000        0.000000
0.384201        0.000000        0.504488        0.111311
0.071813        0.491921        0.091562        0.344704
0.233393        0.391382        0.107720        0.267504

==> /home/msimenc/analysis/scrnaseq/scenic/create_cisTarget_database/Athaliana.Col-0.HPIv01_10k.promoters.fasta <==
>AT1G01030.Araport11.447
TCACTCACTTTGTTAAAAGAATAATTCAGTGTCTGGACACTAAAATCTTCCAAAAACCCC
ATATACATATATGCTATTTCGATACTTATATTTATTTACTCAGCATAAAAAATATTAACC
ATGTATTCATAGTAAAATGTTTCATGTGATATCAAACCAGCGACAACAAAAGTATTATTC
CCCTCATTATGTTTGACTCCTATTATATTTTTATTTTAATTTTTTTCACTATCATCTTTC
TTGCAATGAAAGTCCCATATATTGGTCAACATTTCAAACCACTTGTTCTCTTTTATGTTT
TGGTAAGAGCTATCTTCTAAATTTATAATACGCATAAATTCAAAAGTAAAAGAAAATTTT
GGTCATGAATGTTGTTTAAGTCATTTGGAGATACGAAATCAAATCTCCTTGTAGATTTTG
TTTTTAGAATGTCGTTCCTTTTTCATCATCTTAGCTATATCTACAGCTATATATCCTATC
TTTAAACCTATATTATTTTTTCCTCTCTTCACCAAAGCCATGTTTTTTAGTTGTGGCGAA
AAATAAGAAATCCATACATCAACATATCGCTTTCGTTACCTTAAATTTTGGCTTGTTATG
AAGGCATGTCATAACGTTTCTAGTCACAACTCACAAGCATACCAACGACCATGATAAATC
CAAAAAGTAGAAACAATCTATTATCTAAACCCCCAAAAGACAAAAGAAAAAAGTAGAAAG
AAAAGGTAGGCAGAGATATAATGCTGGTTTTATTTGTTTGTTAAAAGATATTGCTATTTC
TGCCAATATTAAAACTTCACTTAGGAAGACTTGAACCTACCACACGTTAGTGACTAATGA
GAGCCACTAGATAATTGCATGCATCCCACACTAGTACTAATTTTCTAGGGATATTAGAGT
TTTCTAATCACCTACTTCCTACTATGTGTATGTTATCTACTGGCGTGGATGCTTTTAAAG
ATGTTACGTTATTATTTTGTTCGGTTTGGAAAACGGCTCAATCGTTATGAGTTCGTAAGA
CACATACATTGTTCCATGATAAAATGCAACCCCACGAACCATTTGCGACAAGCAAAACAA
CATGGTCAAAATTAAAAGCTAACAATTAGCCAGCGATTCAAAAAGTCAACCTTCTAGATG
GATTTAACAACATATCGATAGGATTCAAGATTAAAAATAAGCACACTCTTATTAATGTTA
AAAAACGAATGAGATGAAAATATTTGGCGTGTTCACACACATAATCTAGAAGACAGATTC
GAGTTGCTCTCCTTTGTTTTGCTTTGGGAGGGACCCATTATTACCGCCCAGCAGCTTCCC
AGCCTTCCTTTATAAGGCTTAATTTATATTTATTTAAATTTTATATGTTCTTCTATTATA
ATACTAAAAGGGGAATACAAATTTCTACAGAGGATGATATTCAATCCACGGTTCACCCAA
ACCGATTTTATAAAATTTATTATTAAATCTTTTTTAATTGTTAAATTGGTTTAAATCTGA
ACTCTGTTTACTTACATTGATTAAAATTCTAAACCATCATAAGTAAAAAATAATATGATT
AAGACTAATAAATCTTAATAGTTAATACTACTCGGTTTACTACATGAAATTTCATACCAT
CAATTGTTTTAATAATCTTTAAAATTGTTAGGACCGGTAAAACCATACCAATTAAACCGG
AGATCCATATTAATTTAATTAAGAAAATAAAAATAAAAGGAATAAATTGTCTTATTTAAA
ghuls commented 2 years ago

Do you have >seq_name lines which are not followed by a sequence?

Also rescale your Cluster-Buster matrices to 100 as by default a pseudocount of 0.375 is added to each matrix element by Cluster-Buster.

mcsimenc commented 2 years ago

Thanks for the tip about scaling the matrices.

Yes, the problem was a sequence header without any sequence! I used samtools faidx to see the lengths of all sequences but it omitted the ones that were just headers. Thank you for your help!