gecrooks / weblogo

WebLogo 3: Sequence Logos redrawn
weblogo.threeplusone.com
Other
146 stars 39 forks source link

datatype format descriptions? #84

Closed hepcat72 closed 6 years ago

hepcat72 commented 6 years ago

I have base position frequency files that look like this:

WMB1    1       A:16631 C:14695 G:177240        N:1     T:13618
WMB1    2       A:16694 C:172217        G:16523 T:16751
WMB1    3       A:163807        C:18140 G:20611 T:19627
WMB1    4       A:16750 C:13852 G:178811        T:12772
WMB1    5       A:17119 a:3472  C:168226        G:16196 T:17172
WMB1    6       A:986   a:218713        C:911   G:877   T:698
WMB1    7       A:17846 C:16234 G:172700        N:2     T:15403
WMB1    8       A:15834 C:172280        G:15528 N:1     T:18542
WMB1    9       A:22542 C:18527 G:162259        N:4     T:18853

from which I generated meme motif files (to run ceqlogo) that look like this (different data):

MEME version 4
ALPHABET= ACGT
strands: +
MOTIF NMB1
letter-probability matrix: alength= 4 w= 5
0.0415133953509286 0.0384038684037128 0.882168298390067 0.0379144378552911
0.0375564444859509 0.908735902930184 0.0233006988653167 0.030406953718548
0.821032715751535 0.0504909005695035 0.0431840310745162 0.085292352604445
0.883870063971859 0.0344988006357409 0.0566822834788654 0.0249488519135351
0.664751020803464 0.0782002037283323 0.124689348882175 0.132359426586029

But I cannot find a description of any of the formats listed in the usage for --datatype. Are either of these formats one of the formats accepted by --datatype?

gecrooks commented 6 years ago

weblogo --help lists the supported formats. Neither of these formats is currently supported, although WebLogo will read transfac position weight matrix files.

hepcat72 commented 6 years ago

OK. I'm still missing something though. I saw the list of formats for --datatype: clustal, fasta, msf, genbank, nbrf, nexus, phylip, stockholm, intelligenetics, table, array, transfac. Where can I find the format description so that I can transform my data into one of those formats? E.g. What does table format look like? Is it tab delimited? What are the rows? What are the columns? Does it take counts or percentages? Where is this information? Are there examples of the formats somewhere that I overlooked?

gecrooks commented 6 years ago

Descriptions of the different formats can be found in the parsing code.

https://github.com/WebLogo/weblogo/tree/master/corebio/seq_io