asjadnaqvi / stata-sankey

A Stata package for Sankey diagrams
MIT License
21 stars 7 forks source link
ado package sankey stata

StataMin issues license Stars version release

Installation | Syntax | Citation guidelines | Examples | Feedback | Change log


sankey-1


sankey v1.8

(22 Sep 2024)

This package allows users to draw Sankey plots in Stata. It is based on the Sankey Guide published on the Stata Guide on Medium on October 2021.

Installation

The package can be installed via SSC or GitHub. The GitHub version, might be more recent due to bug fixes, feature updates etc, and may contain syntax improvements and changes in default values. See version numbers below. Eventually the GitHub version is published on SSC.

SSC (v1.74):

ssc install sankey, replace

GitHub (v1.8):

net install sankey, from("https://raw.githubusercontent.com/asjadnaqvi/stata-sankey/main/installation/") replace

The palettes package is required to run this command:

ssc install palettes, replace
ssc install colrspace, replace

Even if you have these packages installed, please check for updates: ado update, update.

If you want to make a clean figure, then it is advisable to load a clean scheme. These are several available and I personally use the following:

ssc install schemepack, replace
set scheme white_tableau  

You can also push the scheme directly into the graph using the scheme(schemename) option. See the help file for details or the example below.

I also prefer narrow fonts in figures with long labels. You can change this as follows:

graph set window fontface "Arial Narrow"

Syntax

The syntax for the latest version is as follows:

sankey value [if] [in], from(var) to(var) [ by(var) palette(str) colorby(layer|level) colorvar(var) stock colorvarmiss(str) colorboxmiss(str) smooth(1-8) gap(num) 
        recenter(mid|bot|top) ctitles(list) ctgap(num) ctsize(num) ctposition(bot|top)
        ctcolor(str) labangle(str) labsize(str) labposition(str) labgap(str) showtotal labprop labscale(num) valsize(str) valcondition(num) format(str) valgap(str) 
        novalues valprop valscale(num) novalright novalleft nolabels 
        sort1(value|name[, reverse]) sort2(value|order[, reverse]) align fill lwidth(str) lcolor(str) alpha(num) offset(num) boxwidth(str) percent wrap(num) * ]

See the help file help sankey for details.

The most basic use is as follows:

sankey value, from(var1) to(var2) [by(level)]

where var1 and var2 are source and destination variables respectively against which the value variable is plotted. The by() variable defines the levels and is optional since v1.72.

Citation guidelines

Software packages take countless hours of programming, testing, and bug fixing. If you use this package, then a citation would be highly appreciated. Suggested citations:

in BibTeX

@software{sankey,
   author = {Naqvi, Asjad},
   title = {Stata package ``sankey''},
   url = {https://github.com/asjadnaqvi/stata-sankey},
   version = {1.8},
   date = {2024-09-22}
}

or simple text

Naqvi, A. (2024). Stata package "sankey" version 1.8. Release date 22 September 2024. https://github.com/asjadnaqvi/stata-sankey.

or see SSC citation (updated once a new version is submitted)

Examples

Get the example data from GitHub:

import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_example2.xlsx?raw=true", clear first

Let's test the sankey command:

sankey value, from(source) to(destination) by(layer)

Smooth

sankey value, from(source) to(destination) by(layer) smooth(2)
sankey value, from(source) to(destination) by(layer) smooth(8)

Re-center

sankey value, from(source) to(destination) by(layer) recenter(bot)
sankey value, from(source) to(destination) by(layer) recenter(top)

Gaps

sankey value, from(source) to(destination) by(layer) gap(0)
sankey value, from(source) to(destination) by(layer) gap(20)

Values

sankey value, from(source) to(destination) by(layer) noval showtot

Sort (v1.6)

sankey value, from(source) to(destination) by(layer) sort1(name)
sankey value, from(source) to(destination) by(layer) sort1(value)
sankey value, from(source) to(destination) by(layer) sort1(value) sort2(value)
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(value)
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(value, reverse) 
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(order) 
sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(order, reverse) 

Custom sorting on a value:

gen source2 = .
gen destination2 = .

foreach x in source destination {
    replace `x'2 = 1 if `x'=="Blog"
    replace `x'2 = 2 if `x'=="LinkedIn"
    replace `x'2 = 3 if `x'=="Twitter"
    replace `x'2 = 4 if `x'=="Direct"
    replace `x'2 = 5 if `x'=="App"
    replace `x'2 = 6 if `x'=="Medium"   
    replace `x'2 = 7 if `x'=="Website"
    replace `x'2 = 8 if `x'=="Homepage"
    replace `x'2 = 9 if `x'=="Total"
    replace `x'2 = 10 if `x'=="Google"
    replace `x'2 = 11 if `x'=="Facebook"
}

lab de labels 1 "Blog" 2 "LinkedIn" 3 "Twitter" 4 "Direct" 5 "App" 6 "Medium" 7 "Website" 8 "Homepage" 9 "Total" 10 "Google" 11 "Facebook", replace

lab val source2 labels
lab val destination2 labels

sankey value, from(source2) to(destination2) by(layer) 

boxwidth

sankey value, from(source) to(destination) by(layer) boxwid(5)

valcond

sankey value, from(source) to(destination) by(layer) valcond(200)
sankey value, from(source) to(destination) by(layer) valcond(300)

Palettes

sankey value, from(source) to(destination) by(layer) palette(CET C6)
sankey value, from(source) to(destination) by(layer) colorby(level)

color by variable (v1.4)

gen trace1 = 1 if source=="App"

sankey value, from(source) to(destination) by(layer) colorvar(trace1)
cap drop trace2
gen trace2 = .
replace trace2 = 1 if  source=="App" & destination=="App" & layer==0
replace trace2 = 2 if  source=="App" & destination=="App" & layer==1
replace trace2 = 3 if  source=="App" & destination=="App" & layer==2
replace trace2 = 4 if  source=="App" & destination=="Total" & layer==3

sankey value, from(source) to(destination) by(layer) colorvar(trace2)
sankey value, from(source) to(destination) by(layer) colorvar(trace2) palette(Oranges)
sankey value, from(source) to(destination) by(layer) colorvar(trace2) palette(Blues) ///
 colorvarmiss(gs13) colorboxmiss(gs13)
sankey value, from(source) to(destination) by(layer) colorvar(trace2) ///
palette(blue*0.1 blue*0.3 blue*0.5 blue*0.7) colorvarmiss(gs13) colorboxmiss(gs13)

column titles (v1.4)

sankey value, from(source) to(destination) by(layer) ctitles(Cat1 Cat2 Cat3 Cat4 Cat5)
sankey value, from(source) to(destination) by(layer) ctitles(Cat1 Cat2 Cat3 Cat4 Cat5) ctg(-100)
sankey value, from(source) to(destination) by(layer) ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctg(-100)
sankey value, from(source) to(destination) by(layer) ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctpos(top) ctg(100) recenter(top)

label rotation and offset

sankey value, from(source) to(destination) by(layer) noval showtot palette(CET C6) ///
    laba(0) labpos(3) labg(-1) offset(10)

hide values and labels (v1.5)

sankey value, from(source) to(destination) by(layer) novalleft
sankey value, from(source) to(destination) by(layer) novalright
sankey value, from(source) to(destination) by(layer) noval
sankey value, from(source) to(destination) by(layer) nolabels

proportional values and labels (v1.5)

sankey value, from(source) to(destination) by(layer) valprop vals(2) 
sankey value, from(source) to(destination) by(layer) labprop labs(2)

stocks (v1.6)

sankey value, from(source) to(destination) by(layer) stock

All together

sankey value, from(source) to(destination) by(layer) palette(CET C6) alpha(60) ///
    labs(2.5) laba(0) labpos(3) labg(-1) offset(5)  noval showtot ///
    ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctg(-100) cts(3) ///
    title("My sankey plot", size(6)) note("Made with the #sankey package.", size(2.2)) ///
    xsize(2) ysize(1)

Feedback

Please open an issue to report errors, feature enhancements, and/or other requests.

Change log

v1.8 (22 Sep 2024)

v1.74 (11 Jun 2024)

v1.73 (16 Mar 2024)

v1.72 (12 Feb 2024)

v1.71 (15 Jan 2024)

v1.7 (06 Nov 2023)

v1.61 (22 Jul 2023)

v1.6 (11 Jun 2023)

v1.51 (25 May 2023)

v1.5 (30 Apr 2023)

v1.4 (23 Apr 2023)

v1.31 (04 Apr 2023)

v1.3 (26 Feb 2023)

v1.21 (15 Feb 2023)

v1.2 (02 Feb 2023)

v1.1 (13 Dec 2022)

v1.0 (08 Dec 2022)