Phlya / adjustText

A small library for automatically adjustment of text position in matplotlib plots to minimize overlaps.
https://adjusttext.readthedocs.io/
MIT License
1.48k stars 87 forks source link

Why do labels move that are not overlapping? #50

Closed kimgerdes closed 6 years ago

kimgerdes commented 6 years ago

Is it possible to stop the adjust_text function from moving labels even if there is no other label overlapping with them? Here is my plot that i'd love to unclutter: image What i get is that: image Why is the "Arabic" label moving? Here is the code to reproduce what I'm dong (wrong):

d={'Afrikaans': 1.93, 'Amharic': 44.56, 'AncientGreek': 33.06, 'Arabic': 65.9, 'Armenian': 20.16, 'Bambara': 0.13, 'Basque': 20.4, 'Belarusian': 26.28, 'Breton': 53.21, 'Bulgarian': 25.77, 'Buryat': 0.4, 'Cantonese': 4.4, 'Catalan': 19.14, 'Chinese': 0.19, 'Coptic': 11.67, 'Croatian': 24.72, 'Czech': 36.6, 'Danish': 16.38, 'Dutch': 21.72, 'English': 4.9, 'Erzya': 40.76, 'Estonian': 36.45, 'Faroese': 14.19, 'Finnish': 17.88, 'French': 4.67, 'Galician': 17.52, 'German': 21.45, 'Gothic': 34.23, 'Greek': 34.27, 'Hebrew': 28.75, 'Hindi': 1.4, 'Hungarian': 27.91, 'Indonesian': 2.6, 'Irish': 87.93, 'Italian': 22.75, 'Japanese': 0.0, 'Kazakh': 0.89, 'Komi': 19.34, 'Korean': 0.35, 'Kurmanji': 0.61, 'Latin': 27.5, 'Latvian': 24.22, 'Lithuanian': 28.8, 'Maltese': 7.26, 'Marathi': 2.64, 'Naija': 2.29, 'NorthSami': 21.18, 'Norwegian': 19.43, 'OldChurchSlavonic': 37.51, 'OldFrench': 20.14, 'Persian': 0.99, 'Polish': 30.55, 'Portuguese': 12.84, 'Romanian': 29.0, 'Russian': 29.15, 'Sanskrit': 20.09, 'Serbian': 24.1, 'Slovak': 33.18, 'Slovenian': 31.72, 'Spanish': 19.09, 'Swedish': 18.84, 'SwedishSign': 19.23, 'Tagalog': 98.18, 'Tamil': 2.95, 'Telugu': 0.85, 'Thai': 0.06, 'Turkish': 6.38, 'Ukrainian': 26.38, 'UpperSorbian': 22.03, 'Urdu': 0.74, 'Uyghur': 3.58, 'Vietnamese': 1.78}
df = pd.Series(d)
fig, aa = plt.subplots(figsize=(10, 2.5))
aa.axes.get_yaxis().set_visible(False)
plt.ylim(-2,0.2)
plt.xlim(-2,102)
aa.scatter( df, [0 for _ in df], alpha=0.5, edgecolors='none') 
aa.spines['left'].set_visible(False)
aa.spines['right'].set_visible(False)
aa.spines['bottom'].set_visible(False)
aa.xaxis.set_label_position('top') 
aa.xaxis.set_ticks_position('top')
plt.tight_layout()
texts=[]
for label, x in zip(df.index, df):
    texts+=[aa.text(x,-.3,label, fontsize=8,  horizontalalignment='center', verticalalignment='top',rotation=90)] 
adjust_text(texts, autoalign='y', only_move={'points':'y', 'text':'y'})
plt.show()

What I'd like is that the labels move only down, and only if they are overlapping. Is there a way to do that? I've experimented with quite a few parameters (expand_align, force_text, force_points, ...) for a few hours, and I don't seem to get the gist of these parameters.

Here's approximately what I'd like to obtain: Labels moved down if necessary, overlapping only if not enough room is available for all the labels: image (The data is the percentage of subjects to the right of their verb, if anyone cares :) )

Phlya commented 6 years ago

OK, so... Just recently I was trying to do a similar thing and realized how complicated it is. First, Arabic moves up because of autoalignment, it seems, for some reason... Maybe because if x and y are not specified the algorithm repels texts from their original positions, which is not ideal in this case (but a good default with a regular 2D scatter plot to avoid hiding the data), and this should be made optional. So a solution here would be to add the coordinates from the plot as x and y arguments - this should improve the result anyway, since the texts will be less prone to moving up.

Another complication is that your alignment setting doesn't really do anything, because adjust_text sets them to center by default to avoid preferential movement in a particular direction. But you can set it in the function to the same values as you do now when creating texts, and this will work then.

So this code produces this

d={'Afrikaans': 1.93, 'Amharic': 44.56, 'AncientGreek': 33.06, 'Arabic': 65.9, 'Armenian': 20.16,
   'Bambara': 0.13, 'Basque': 20.4, 'Belarusian': 26.28, 'Breton': 53.21, 'Bulgarian': 25.77, 'Buryat': 0.4,
   'Cantonese': 4.4, 'Catalan': 19.14, 'Chinese': 0.19, 'Coptic': 11.67, 'Croatian': 24.72, 'Czech': 36.6,
   'Danish': 16.38, 'Dutch': 21.72, 'English': 4.9, 'Erzya': 40.76, 'Estonian': 36.45,
   'Faroese': 14.19, 'Finnish': 17.88, 'French': 4.67,
   'Galician': 17.52, 'German': 21.45, 'Gothic': 34.23, 'Greek': 34.27,
   'Hebrew': 28.75, 'Hindi': 1.4, 'Hungarian': 27.91, 'Indonesian': 2.6,
   'Irish': 87.93, 'Italian': 22.75, 'Japanese': 0.0, 'Kazakh': 0.89, 'Komi': 19.34, 'Korean': 0.35, 'Kurmanji': 0.61,
   'Latin': 27.5, 'Latvian': 24.22, 'Lithuanian': 28.8, 'Maltese': 7.26, 'Marathi': 2.64,
   'Naija': 2.29, 'NorthSami': 21.18, 'Norwegian': 19.43, 'OldChurchSlavonic': 37.51, 'OldFrench': 20.14,
   'Persian': 0.99, 'Polish': 30.55, 'Portuguese': 12.84, 'Romanian': 29.0, 'Russian': 29.15,
   'Sanskrit': 20.09, 'Serbian': 24.1, 'Slovak': 33.18, 'Slovenian': 31.72, 'Spanish': 19.09, 'Swedish': 18.84, 'SwedishSign': 19.23,
   'Tagalog': 98.18, 'Tamil': 2.95, 'Telugu': 0.85, 'Thai': 0.06, 'Turkish': 6.38,
   'Ukrainian': 26.38, 'UpperSorbian': 22.03, 'Urdu': 0.74, 'Uyghur': 3.58, 'Vietnamese': 1.78}
df = pd.Series(d)
fig, aa = plt.subplots(figsize=(10, 2.5))
aa.axes.get_yaxis().set_visible(False)
plt.ylim(-2,0.2)
plt.xlim(-2,102)
aa.scatter( df, [0 for _ in df], alpha=0.5, edgecolors='none') 
aa.spines['left'].set_visible(False)
aa.spines['right'].set_visible(False)
aa.spines['bottom'].set_visible(False)
aa.xaxis.set_label_position('top') 
aa.xaxis.set_ticks_position('top')
plt.tight_layout()
texts=[]
for label, x in zip(df.index, df):
    texts+=[aa.text(x,-.1,label, fontsize=8,  rotation=90)] 
adjust_text(texts,
            df, [0 for _ in df], ha='center', va='top',
            autoalign='', only_move={'points':'y', 'text':'y', 'objects':'y'},
           )
plt.show()

image

Still not ideal, but at least there is very few labels that moved up. The problem is with the dense regions where the texts just don't fit in the axes. Also, if you look closely, you can see that there is loads of space between them and which makes a lot of overlaps that happen unnecessary, That is because by default the algorithm by default expands the texts by 20% to avoid them being right next to each other for better readability - but in this difficult case it might be better to not do this. Also, we can reduce the repelling force for them, this usually improves alignment by adjusting positions slower and more carefully. So this code gives this:

adjust_text(texts,
            df, [0 for _ in df], ha='center', va='top', expand_text=(1, 1), force_text=(0, 0.1),
            autoalign='', only_move={'points':'y', 'text':'y', 'objects':'y'})
plt.show()

image Clearly, there is just not enough space on the plot to fit all the labels... But that is not the only explanation why it doesn't make a perfect figure with no overlaps - even if you make the figure bigger in y it doesn't help! That's because of a couple of issues I recently noticed, including reported here, about the determination of when to stop the process. I have just fixed some stuff and pushed it here, so if you update your adjustText from GitHub and increase the vertical size of your figure (to 7 inches), everything will work perfectly even without adjusting expansion or force! I kept the expand_text=(1, 1) to make the whole thing a little more compact, but it works with the default too.


adjust_text(texts, df, [0 for _ in df],
            ha='center', va='top', expand_text=(1, 1),
            autoalign='', only_move={'points':'y', 'text':'y'})

image

So thanks for stimulating me to fix this, and let me know if you have more questions! Also, since it's such a good example, do you mind if I add it to the Examples notebook here?

kimgerdes commented 6 years ago

Wow! Thank a lot you for that quick fix! It works nearly as perfectly as what you got: image but the Thai and others stick out (or rather, the others moved down). What I did is get the new adjustText.py script, changed the fig creation line to fig, aa = plt.subplots(figsize=(10, 8)) and replaced the adjust_text command that you provided. I tried with a few different fig sizes, too. There must still be something that I'm doing wrong :(

And, sure, you can put the example in your notebook. Maybe you could add that the data was extracted from universaldependencies.org v2.2 by Kim Gerdes (because there will be a publication using these graphs soon...). Thanks again for your interest in my problem :)

Phlya commented 6 years ago

Mmm right, Just to be sure, this is the exact full code I use. Maybe I copied something wrong before...

d={'Afrikaans': 1.93, 'Amharic': 44.56, 'AncientGreek': 33.06, 'Arabic': 65.9, 'Armenian': 20.16,
   'Bambara': 0.13, 'Basque': 20.4, 'Belarusian': 26.28, 'Breton': 53.21, 'Bulgarian': 25.77, 'Buryat': 0.4,
   'Cantonese': 4.4, 'Catalan': 19.14, 'Chinese': 0.19, 'Coptic': 11.67, 'Croatian': 24.72, 'Czech': 36.6,
   'Danish': 16.38, 'Dutch': 21.72, 'English': 4.9, 'Erzya': 40.76, 'Estonian': 36.45,
   'Faroese': 14.19, 'Finnish': 17.88, 'French': 4.67,
   'Galician': 17.52, 'German': 21.45, 'Gothic': 34.23, 'Greek': 34.27,
   'Hebrew': 28.75, 'Hindi': 1.4, 'Hungarian': 27.91, 'Indonesian': 2.6,
   'Irish': 87.93, 'Italian': 22.75, 'Japanese': 0.0, 'Kazakh': 0.89, 'Komi': 19.34, 'Korean': 0.35, 'Kurmanji': 0.61,
   'Latin': 27.5, 'Latvian': 24.22, 'Lithuanian': 28.8, 'Maltese': 7.26, 'Marathi': 2.64,
   'Naija': 2.29, 'NorthSami': 21.18, 'Norwegian': 19.43, 'OldChurchSlavonic': 37.51, 'OldFrench': 20.14,
   'Persian': 0.99, 'Polish': 30.55, 'Portuguese': 12.84, 'Romanian': 29.0, 'Russian': 29.15,
   'Sanskrit': 20.09, 'Serbian': 24.1, 'Slovak': 33.18, 'Slovenian': 31.72, 'Spanish': 19.09, 'Swedish': 18.84, 'SwedishSign': 19.23,
   'Tagalog': 98.18, 'Tamil': 2.95, 'Telugu': 0.85, 'Thai': 0.06, 'Turkish': 6.38,
   'Ukrainian': 26.38, 'UpperSorbian': 22.03, 'Urdu': 0.74, 'Uyghur': 3.58, 'Vietnamese': 1.78}
df = pd.Series(d)
fig, aa = plt.subplots(figsize=(10, 7))
aa.axes.get_yaxis().set_visible(False)
plt.ylim(-10,0.2)
plt.xlim(-2,102)
aa.scatter( df, [0 for _ in df], alpha=0.5, edgecolors='none') 
aa.spines['left'].set_visible(False)
aa.spines['right'].set_visible(False)
aa.spines['bottom'].set_visible(False)
aa.xaxis.set_label_position('top') 
aa.xaxis.set_ticks_position('top')
plt.tight_layout()
texts=[]
for label, x in zip(df.index, df):
    texts+=[aa.text(x,-.1,label, fontsize=8,  rotation=90)] 

adjust_text(texts, df, [0 for _ in df],
            expand_text=(1, 1), ha='center', va='top',
            autoalign='', only_move={'points':'y', 'text':'y'})

And this is the output:

image

Phlya commented 6 years ago

And thanks, of course I can add that info! Let me know when it's published, I can add a citation! And hope you mention adjustText in you paper :)

Phlya commented 6 years ago

For the actual publication quality figures, I recommend trying to reduce the force_text and increase lim, then it'll take ages to adjust everything, but will get rid of all the whitespace between texts. For example, with force_text=0.5 after 277 iterations it produces this: image

kimgerdes commented 6 years ago

hello again, yes, i'd certainly include a reference to your algorithm in the paper!

here's the complete code, with colors by language group:

d={'Afrikaans': 1.93, 'Amharic': 44.56, 'AncientGreek': 33.06, 'Arabic': 65.9, 'Armenian': 20.16,
'Bambara': 0.13, 'Basque': 20.4, 'Belarusian': 26.28, 'Breton': 53.21, 'Bulgarian': 25.77, 'Buryat': 0.4,
'Cantonese': 4.4, 'Catalan': 19.14, 'Chinese': 0.19, 'Coptic': 11.67, 'Croatian': 24.72, 'Czech': 36.6,
'Danish': 16.38, 'Dutch': 21.72, 'English': 4.9, 'Erzya': 40.76, 'Estonian': 36.45,
'Faroese': 14.19, 'Finnish': 17.88, 'French': 4.67,
'Galician': 17.52, 'German': 21.45, 'Gothic': 34.23, 'Greek': 34.27,
'Hebrew': 28.75, 'Hindi': 1.4, 'Hungarian': 27.91, 'Indonesian': 2.6,
'Irish': 87.93, 'Italian': 22.75, 'Japanese': 0.0, 'Kazakh': 0.89, 'Komi': 19.34, 'Korean': 0.35, 'Kurmanji': 0.61,
'Latin': 27.5, 'Latvian': 24.22, 'Lithuanian': 28.8, 'Maltese': 7.26, 'Marathi': 2.64,
'Naija': 2.29, 'NorthSami': 21.18, 'Norwegian': 19.43, 'OldChurchSlavonic': 37.51, 'OldFrench': 20.14,
'Persian': 0.99, 'Polish': 30.55, 'Portuguese': 12.84, 'Romanian': 29.0, 'Russian': 29.15,
'Sanskrit': 20.09, 'Serbian': 24.1, 'Slovak': 33.18, 'Slovenian': 31.72, 'Spanish': 19.09, 'Swedish': 18.84, 'SwedishSign': 19.23,
'Tagalog': 98.18, 'Tamil': 2.95, 'Telugu': 0.85, 'Thai': 0.06, 'Turkish': 6.38,
'Ukrainian': 26.38, 'UpperSorbian': 22.03, 'Urdu': 0.74, 'Uyghur': 3.58, 'Vietnamese': 1.78}
langnameGroup={"AncientGreek":"Indo-European", "Arabic":"Semitic", "Basque":"isolate", "Belarusian":"Indo-European-Baltoslavic", "Bulgarian":"Indo-European-Baltoslavic", "Cantonese":"Sino-Austronesian", "Catalan":"Indo-European-Romance", "Chinese":"Sino-Austronesian", "Coptic":"Afroasiatic", "Croatian":"Indo-European-Baltoslavic", "Czech":"Indo-European-Baltoslavic", "Danish":"Indo-European-Germanic", "Dutch":"Indo-European-Germanic", "English":"Indo-European-Germanic", "Estonian":"Agglutinating", "Finnish":"Agglutinating", "French":"Indo-European-Romance", "Galician":"Indo-European-Romance", "German":"Indo-European-Germanic", "Gothic":"Indo-European-Germanic", "Greek":"Indo-European", "Hebrew":"Semitic", "Hindi":"Indo-European", "Hungarian":"Agglutinating", "Indonesian":"Sino-Austronesian", "Irish":"Indo-European", "Italian":"Indo-European-Romance", "Japanese":"Agglutinating", "Kazakh":"Agglutinating", "Korean":"Agglutinating", "Latin":"Indo-European-Romance", "Latvian":"Indo-European-Baltoslavic", "Lithuanian":"Indo-European-Baltoslavic", "Norwegian":"Indo-European-Germanic", "OldChurchSlavonic":"Indo-European-Baltoslavic", "Persian":"Indo-European", "Polish":"Indo-European-Baltoslavic", "Portuguese":"Indo-European-Romance", "Romanian":"Indo-European-Romance", "Russian":"Indo-European-Baltoslavic", "Sanskrit":"Indo-European", "Slovak":"Indo-European-Baltoslavic", "Slovenian":"Indo-European-Baltoslavic", "Spanish":"Indo-European-Romance", "Swedish":"Indo-European-Germanic", "Tamil":"Dravidian", "Turkish":"Agglutinating", "Ukrainian":"Indo-European-Baltoslavic", "Urdu":"Indo-European", "Uyghur":"Agglutinating", "Vietnamese":"Sino-Austronesian",'Afrikaans':'Indo-European-Germanic', 'SwedishSign':'Indo-European-Germanic', 'Kurmanji':'Indo-European', 'NorthSami':'Agglutinating', 'UpperSorbian':"Indo-European-Baltoslavic", 'Buryat':'Agglutinating', 'Telugu':'Dravidian', 'Serbian':"Indo-European-Baltoslavic", 'Marathi':'Indo-European','Naija':"Indo-European-Germanic", "OldFrench":"Indo-European-Romance", "Maltese":"Semitic", "Thai":"Sino-Austronesian","Amharic":"Afroasiatic", 'Erzya': 'Agglutinating', 'Faroese':"Indo-European-Germanic", 'Tagalog':"Sino-Austronesian", 'Bambara':'Niger-Congo', 'Breton':"Indo-European", 'Armenian':"Indo-European", 'Komi': 'Agglutinating'}
groupColors={"Indo-European-Romance":'brown',"Indo-European-Baltoslavic":'purple',"Indo-European-Germanic":'olive',"Indo-European":'royalBlue',"Sino-Austronesian":'limeGreen', "Agglutinating":'red'}
df = pd.Series(d)
c=[groupColors.get(langnameGroup[label],'k') for label in df.index]
#df = dfPositive["subject"]
fig, aa = plt.subplots(figsize=(10, 7))
aa.axes.get_yaxis().set_visible(False)
plt.ylim(-10,0.2)
plt.xlim(-2,102)
aa.scatter( df, [0 for _ in df], c=c, alpha=0.5, edgecolors='none') 
aa.spines['left'].set_visible(False)
aa.spines['right'].set_visible(False)
aa.spines['bottom'].set_visible(False)
aa.xaxis.set_label_position('top') 
aa.xaxis.set_ticks_position('top')
plt.tight_layout()
texts=[]
for label, x in zip(df.index, df):
    texts+=[aa.text(x,-.1,label, color=groupColors.get(langnameGroup[label],'k'), fontsize=8,  rotation=90)] 
adjust_text(texts, df, [0 for _ in df],
    expand_text=(1, 1), ha='center', va='top',force_text=1.5,lim=277,
    autoalign='', only_move={'points':'y', 'text':'y'})

plt.show()

giving image

I've run the code to visualize a few different datasets gerdes.fr/unidimscat.pdf (putting force_text=.6)- most of the time it works well but sometimes when there are too many labels at the same point, the algorithm seems to loose hope, see for example the graph of VERB-object-PRON-direction The graphs that I need are good enough, but if you want to look into the data, I can share it.

Actually, originally, I wanted the labels to be 45 degrees to make them more readable horizontally. The current algorithm transforms image into image I have the impression that the bounding rectangle is used and it's probably hard to solve that. But maybe not :-) so I dare to ask whether you have an idea. If not i'm completely happy with the current state. I also have another issue I'll post on bidimensional scatters...

Phlya commented 6 years ago

Wow, with colours it looks awesome! Slavic languages are very nicely grouped together!

So, the default limit of iterations is 100, which is usually sufficient for interactive work to produce something reasonable and not wait too long, but for difficult plots it might help to increase it. Also the function actually outputs the number of iterations is used, so if you want to see whether this is the problem, you can just print it out... Also for "perfect" plots you can set precision=0, although the default threshold is very small anyway.

Concerning 45 degree texts - yeah, bounding boxes are used, so it is not supposed to work as you might expect it to, otherwise. However playing with expand_text can produce OK plots in the case of identical rotation of all texts like here. E.g. setting it to expand_text=(0.75, 0.75) gives this, which is not too bad and slightly more compact than what you get otherwise: image

Support of rectangles not aligned with axes or of other shapes would be great, but would require some novel approach for finding intersections and so on...

kimgerdes commented 6 years ago

Ok, perfect. I think that the 90° version is more readable finally, because the long distances between the 45° labels makes it look as if their position carried meaning. Thanks a lot!

Phlya commented 6 years ago

@kimgerdes Hi, I was looking through this again (because I wanted to add an example to the notebook) and realized I didn't notice your link to a pdf with the plots. There are indeed a few issues where the texts completely overlap, and I think this is because they have exactly the same x and y coordinates - I think adding a tiny random shift in y position should solve this (you can set the random seed before it for reproducibility). Please let me know whether it helps, and share the data with one of these examples if you can, if it doesn't!

Also, sometimes there is simply not enough vertical space to put all the labels, not sure what can be done about that without just making all plots much taller...