Open Chimrod opened 5 years ago
awesome! Could you also use the svg? That seems maybe a bit easier and I'd proably be more willing to include svg rather than tikz. https://github.com/amueller/word_cloud/issues/58#issuecomment-485048380
Is pdf2svg a no go ? :-)
I've used this code to generate a first page of a bigger document, as there is no tools for generating such presentation with Latex, I think there is a need for such tools, even if you'r not interested yourself (and providing a direct tkiz output is not incompatible with an svg one !)
I do not know the svg format, and I do not know if there is the anchor system provided by tikz which allow to declare easilly the node position in the page.
I would rather go the other direction as svg is more widely used than tikz. It's a general vector graphic format.
Though your code is really small so I guess a PR would be good as well :)
Here is my homework for you :
def to_svg(self, file):
with open(file, 'w') as output:
output.write("<svg viewBox='0 0 1200 1200' xmlns='http://www.w3.org/2000/svg'>\n")
for (word, count), font_size, position, orientation, color in self.layout_:
if orientation is None:
angle = 00
anchor = "start"
else:
angle = -90
anchor = "end"
output.write("\t<text x='{pos_x}' y='{pos_y}' dominant-baseline='alpabetical' dy='1.5ex' text-anchor='{anchor}' transform='rotate({angle} {pos_x},{pos_y})' font-family='Arial' font-size='{size}'>{text}</text>\n".format(
angle=angle,
anchor=anchor,
pos_x=position[1],
pos_y=position[0],
text=word,
size=font_size))
output.write("</svg>\n")
return None
This is not perfect, I can't figure how define the font name in the svg file, and you have to check the canvas size (they are both hardcoded) in the code above.
The key is not in the dominant-baseline property as I first though, but in the font height. There is no way to get the exact north position, so I've applied a shift from 1.5em
It'is almost perfect. You can see two examples with Arial & Times New Roman font (my laptop is on windows), checked the result with edge, IE, and FF and the result looks the same on all.
You still have some work to do, but the tricky part seems done now.
Thank you @Chimrod! This was exactly what I was looking for...
I made some small adjustments and thought I'd share them:
fontspec
package apparently requires XeTeX. This version works with pdfTeX. fontspec
the layout was messy. I set the font to Times New Roman. To use this function on Windows, call Wordcloud with font_path=r'C:\Windows\Fonts\times.ttf'
from io import StringIO
def latex_clean_single_word(word):
for latex_sensitive in ["\\", "%", "&", "^", "#", "_", "{", "}", "$"]:
if latex_sensitive in word:
word = word.replace(latex_sensitive, '\\' + latex_sensitive)
return word
def wordcloud_to_tikz(wordcloud):
stream = StringIO()
stream.write(r'''
\documentclass{standalone}
\usepackage{tikz}
\usepackage{times}
\usetikzlibrary{positioning, backgrounds}
\usepackage{xcolor}
\begin{document}
{ \fontfamily{ptm}\selectfont
\begin{tikzpicture}[background rectangle/.style={fill=white}, show background rectangle]
''')
for (word, count), font_size, position, orientation, color in wordcloud.layout_:
if orientation is None:
angle = 00
anchor = "north west"
else:
angle = 90
anchor = "north east"
stream.write(
"\t\\node[anchor={anchor},rotate={angle},text={color},font=\\fontsize{{{size}mm}}{{{size}mm}}\selectfont] at ({pos_x},{pos_y}) {{{text}}};".format(
angle=angle,
anchor=anchor,
pos_x=position[1] / 10,
pos_y=-position[0] / 10,
text=latex_clean_single_word(word),
size=font_size,
color='black'))
stream.write(r'''
\end{tikzpicture}
}
\end{document}
''')
return stream.getvalue()
I was also looking for a TikZ output compatible with LuaLaTeX. So including this code would certainly be useful for more people. I don't think we need it to be compatible with PDFLaTeX, since it does not support fonts easily. With a different font the letters might collide!
@ChristofKaufmann is there an issue with using the SVG? If there's a big benefit of latex over SVG we can include it, but it wasn't obvious to me. SVG has been merged in the meantime.
@amueller yes, it did not work for me, when using embed_font=True
. There was an error with the date format, but I could not figure out quickly, what caused the error or where the date came from. I think the error did not stem from word_cloud
, but either from fontTools
or some XML library used by them. The line https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py#L866 threw an error. Should I look again into it?
@ChristofKaufmann that sounds like a bug so that might be good to reproduce if you feel like it. Maybe open a new issue?
@amueller okay, opened a new issue and noticed it only occurs in Spyder-IDE. I didn't notice before. So it might be rather a bug in Spyder maybe.
Anyway, regarding your question whether a to_tikz
is useful. I think yes. There is no other LaTeX library which can do that and LaTeX is used a lot. So people could use this to have a word cloud in native LuaLaTeX/TikZ. Even Pandas' DataFrame has a to_latex
method to write the DataFrame as a LaTeX table.
Good point, so PR welcome :)
Hi ! Your project is very fun !
Here is a tikz output (quick & dirty, data is printed in stdout)
You can now generate the pdf with Latex :
document.pdf
Thanks :)