amueller / word_cloud

A little word cloud generator in Python
https://amueller.github.io/word_cloud
MIT License
10.16k stars 2.32k forks source link

Tikz output #493

Open Chimrod opened 5 years ago

Chimrod commented 5 years ago

Hi ! Your project is very fun !

Here is a tikz output (quick & dirty, data is printed in stdout)

    def to_tikz(self):
        for (word, count), font_size, position, orientation, color in self.layout_:
            if orientation is None:
                angle = 00
                anchor = "north west"
            else:
                angle = 90
                anchor = "north east"

            print("\t\\node[anchor={anchor},rotate={angle},font=\\fontsize{{ {size}mm}}{{ {size}mm}}\selectfont] at ({pos_x},{pos_y}) {{ {text} }};".format(
                angle=angle, 
                anchor=anchor,
                pos_x=position[1]/10, 
                pos_y=-position[0]/10,
                text=word,
                size=font_size))
        return None

You can now generate the pdf with Latex :

\documentclass{standalone}

\usepackage{tikz}
\usetikzlibrary{positioning, backgrounds}
\usepackage{fontspec}
\setmainfont{Droid Sans Mono}

\usepackage{xcolor}

\begin{document}
 {
  \color{olive!45}
  \begin{tikzpicture}[background rectangle/.style={fill=black}, show background rectangle]
   \input{YOUR_OUTPUT}
  \end{tikzpicture}
 }
\end{document}

document.pdf

Thanks :)

amueller commented 5 years ago

awesome! Could you also use the svg? That seems maybe a bit easier and I'd proably be more willing to include svg rather than tikz. https://github.com/amueller/word_cloud/issues/58#issuecomment-485048380

Chimrod commented 5 years ago

Is pdf2svg a no go ? :-)

I've used this code to generate a first page of a bigger document, as there is no tools for generating such presentation with Latex, I think there is a need for such tools, even if you'r not interested yourself (and providing a direct tkiz output is not incompatible with an svg one !)

I do not know the svg format, and I do not know if there is the anchor system provided by tikz which allow to declare easilly the node position in the page.

amueller commented 5 years ago

I would rather go the other direction as svg is more widely used than tikz. It's a general vector graphic format.

Though your code is really small so I guess a PR would be good as well :)

Chimrod commented 5 years ago

Here is my homework for you :

    def to_svg(self, file):
        with open(file, 'w') as output:
            output.write("<svg viewBox='0 0 1200 1200' xmlns='http://www.w3.org/2000/svg'>\n")
            for (word, count), font_size, position, orientation, color in self.layout_:
                if orientation is None:
                    angle = 00
                    anchor = "start"
                else:
                    angle = -90
                    anchor = "end"

                output.write("\t<text x='{pos_x}' y='{pos_y}' dominant-baseline='alpabetical' dy='1.5ex' text-anchor='{anchor}' transform='rotate({angle} {pos_x},{pos_y})' font-family='Arial' font-size='{size}'>{text}</text>\n".format(
                    angle=angle,
                    anchor=anchor,
                    pos_x=position[1],
                    pos_y=position[0],
                    text=word,
                    size=font_size))
            output.write("</svg>\n")
        return None

This is not perfect, I can't figure how define the font name in the svg file, and you have to check the canvas size (they are both hardcoded) in the code above.

The key is not in the dominant-baseline property as I first though, but in the font height. There is no way to get the exact north position, so I've applied a shift from 1.5em

It'is almost perfect. You can see two examples with Arial & Times New Roman font (my laptop is on windows), checked the result with edge, IE, and FF and the result looks the same on all.

You still have some work to do, but the tricky part seems done now.

hstolte commented 4 years ago

Thank you @Chimrod! This was exactly what I was looking for...

I made some small adjustments and thought I'd share them:

  1. The fontspec package apparently requires XeTeX. This version works with pdfTeX.
  2. After simply removing fontspec the layout was messy. I set the font to Times New Roman. To use this function on Windows, call Wordcloud with font_path=r'C:\Windows\Fonts\times.ttf'
  3. If your words contain LaTeX special characters, they need to be escaped.
from io import StringIO

def latex_clean_single_word(word):
    for latex_sensitive in ["\\", "%", "&", "^", "#", "_", "{", "}", "$"]:
        if latex_sensitive in word:
            word = word.replace(latex_sensitive, '\\' + latex_sensitive)
    return word

def wordcloud_to_tikz(wordcloud):
    stream = StringIO()
    stream.write(r'''
\documentclass{standalone}
\usepackage{tikz}
\usepackage{times}
\usetikzlibrary{positioning, backgrounds}
\usepackage{xcolor}

\begin{document}
    { \fontfamily{ptm}\selectfont

        \begin{tikzpicture}[background rectangle/.style={fill=white}, show background rectangle]
    ''')
    for (word, count), font_size, position, orientation, color in wordcloud.layout_:
        if orientation is None:
            angle = 00
            anchor = "north west"
        else:
            angle = 90
            anchor = "north east"

        stream.write(
            "\t\\node[anchor={anchor},rotate={angle},text={color},font=\\fontsize{{{size}mm}}{{{size}mm}}\selectfont] at ({pos_x},{pos_y}) {{{text}}};".format(
                angle=angle,
                anchor=anchor,
                pos_x=position[1] / 10,
                pos_y=-position[0] / 10,
                text=latex_clean_single_word(word),
                size=font_size,
                color='black'))
    stream.write(r'''
    \end{tikzpicture}
 }
\end{document}
    ''')
    return stream.getvalue()
ChristofKaufmann commented 4 years ago

I was also looking for a TikZ output compatible with LuaLaTeX. So including this code would certainly be useful for more people. I don't think we need it to be compatible with PDFLaTeX, since it does not support fonts easily. With a different font the letters might collide!

amueller commented 4 years ago

@ChristofKaufmann is there an issue with using the SVG? If there's a big benefit of latex over SVG we can include it, but it wasn't obvious to me. SVG has been merged in the meantime.

ChristofKaufmann commented 4 years ago

@amueller yes, it did not work for me, when using embed_font=True. There was an error with the date format, but I could not figure out quickly, what caused the error or where the date came from. I think the error did not stem from word_cloud, but either from fontTools or some XML library used by them. The line https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py#L866 threw an error. Should I look again into it?

amueller commented 4 years ago

@ChristofKaufmann that sounds like a bug so that might be good to reproduce if you feel like it. Maybe open a new issue?

ChristofKaufmann commented 4 years ago

@amueller okay, opened a new issue and noticed it only occurs in Spyder-IDE. I didn't notice before. So it might be rather a bug in Spyder maybe.

Anyway, regarding your question whether a to_tikz is useful. I think yes. There is no other LaTeX library which can do that and LaTeX is used a lot. So people could use this to have a word cloud in native LuaLaTeX/TikZ. Even Pandas' DataFrame has a to_latex method to write the DataFrame as a LaTeX table.

amueller commented 4 years ago

Good point, so PR welcome :)