Wrap - avoid splitting words

mkaulisch commented 3 weeks ago

Hi Asjad, I have an issue that the wrap-option splits words into pieces. It would be cool if you could implement an option to keep words together or that the labels are split by a certain character (like space). The options in the program by Mead Over in linewrap (see view net describe linewrap, from("http://digital.cgdev.org/doc/stata/MO/Misc")) seem to be very comprehensive. Maybe it gives a good inspiration. My guess is that it will affect all your packages with the wrap option... Best regards, Marc

asjadnaqvi commented 3 weeks ago

Dear Marc, certainly! I myself want a feature like this. I will check out linewrap. There is also splitvarlabels, so there are some options out there. I will probably write a generic program for all the packages so they all can get upgraded simultaneously :)

mkaulisch commented 3 weeks ago

In the -linewrap- help file it is suggested that -tokenize- is a good starting point. At least in my use case it would be a good starting point to wrap labels by words.

asjadnaqvi commented 3 weeks ago

tokenize is a low-level parser that makes sense for variables but not so much for labels. If you buy me coffees I will implement a better label wrapper today ;)

mkaulisch commented 3 weeks ago

Hope three coffees are enough for a good solution. ;-)

asjadnaqvi commented 3 weeks ago

The program is ready. Just need to (a) either incorporate it in the packages or (b) release it as a standalone program that is called by packages. See examples below.

gen lab1 = "This is a long string that we want to split but we want to make sure that no word gets split in the process."

_labsplit lab1, wrap(12) gen(newlab1)            // respect word boundaries
_labsplit lab1, wrap(12) gen(newlab2) strict   // do a hard split at fixed characters

labsplit1 labsplit2

mkaulisch commented 3 weeks ago

I am curious to see your solution - maybe one/two additional ideas:

Allow for splits after each word
Allow for conditional splits

Maybe you can test the program with my examples: Juristische Fakultät Philosophische Fakultät Mathematisch-Naturwissenschaftliche Fakultät Medizinische Fakultät Wirtschaftswissenschaftliche Fakultät

Some first words are longer than the other labels in total... I can see two scenarios: a) Strictly split after the first word or b) conditionally split after first word when the whole string is longer than or when the first is longer than...

mkaulisch commented 3 weeks ago

And an additional idea that occured while working on a -treemap-chart is that the label wraping can be restricted to the second level only... another idea is that only the values provided with the labels are split onto a second line...

asjadnaqvi commented 3 weeks ago

Set up data:

clear
set obs 5
gen x = 1
gen y = _n

gen mylab = ""
replace mylab = "Juristische Fakultät" in 1
replace mylab = "Philosophische Fakultät" in 2
replace mylab = "Mathematisch-Naturwissenschaftliche Fakultät" in 3
replace mylab = "Medizinische Fakultät" in 4
replace mylab = "Wirtschaftswissenschaftliche Fakultät" in 5

Use the new word(n) option to split on the nth word:

labsplit mylab, word(1) gen(mylab2)

Compare the two:

twoway (scatter y x, mlabel(mylab)  mlabsize(3))
twoway (scatter y x, mlabel(mylab2)  mlabsize(3))

labsplit4 labsplit5

I will release it sometime over the weekend as an independent program that can be called by other programs.

asjadnaqvi commented 3 weeks ago

Move to: https://github.com/asjadnaqvi/stata-graphfunctions

asjadnaqvi / stata-sunburst

Wrap - avoid splitting words #11