Open adamerose opened 3 years ago
I just spent a long while typing a bunch of stuff and github just lost the whole comment. Maybe I'll have the courage to start over in a few days. grr.
top 2 points:
Ok, here is the first part, covering what you already had in your spreadsheet. In the next few days I'll post the rest (all the other important chart types that are missing).
go
) has Surfacego
also has a 3d mesh object, but I can't think of why you would need this for EDA vs surface. About the only use case I can think of is displaying a 3d convex hull as part of clustering / anomaly detectionMissing in 3d:
make_subplots
would allow for this. Given that, the only difference between scatter and scatter 3d and line and line 3d would be a z axis. Adding Z
to scatter and line would remove the extra selection of plot, making for a simplified selection and better UX by the end userSome more observations:
go
has a Sankey objectvisu.ai
, that was super hard, and it doesn't work with current plotly. Haven't had the time to revisit this yet.All the polar plots in plotly express are "sort of" there. They don't allow multiple variables on r
or theta
, and that's basic functionality.
Speaking of polar, Smith charts are nice for RF / microwave analysis. Got pinged on twitter for feedback on that, apparently plotly is adding that to plotly express.
Other charts not covered in your spreadsheet that I've either used for business or toying with in the past 1-2 years:
1D heatmaps (counts over time, for example). Did this one with matplotlib, but one would think this should be trivial to do with plotly?
arc diagrams: see hand drawn image at top of https://blog.dionresearch.com/2020/03/ex-libris-of-data-scientist-part-v.html where arc diagrams are used to visualize sort algorithms. See also: https://chart-studio.plotly.com/~empet/13574/arc-diagram-of-star-wars-characters-that-interacted-in-the-force-awakens/#/
axis scatter graph (with reference plane and drop lines aka "lollipops" of distinct colors or dash style connecting the scatter points to the reference plane). Different from drop line scatter in that the reference plane can be anything, and has very simple ticks to remove distractions. It is typically used to detect inconsistencies.
bulletgraph (a variation on the bar and symbol chart by Stephen Few: https://www.perceptualedge.com/articles/misc/Bullet_Graph_Design_Spec.pdf )
correlation matrix with distribution on diagonal
(notched) box plots: are actually supported by the px box plot, maybe should be a UI selection (for confidence intervals)
Chernoff faces: not sure I've ever really used them for anything
control charts (or trend or run or Skewhart charts) for univariate and Hotelling control charts for multivariate. I've made a python package that does all of that: https://pypi.org/project/hotelling/ and it generates either matplotlib static or plotly figures, for ex: https://dionresearch.github.io/hotelling/hotelling%20cusum.html#Multivariate-control-chart-w/cusum (matplotlib deprecated some key functionality, working on adding it directly in next version of hotelling so I can unfreeze matplotlib)
decomposition graph (for time series analysis, breaks a composite signal into trend, cyclical, seasonal and random), see https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html (but no interactive support)
difference band: see playfair's chart remake here: https://blog.dionresearch.com/2019/10/visualizations-explanatory-exploratory.html .I think cufflinks does that. I did mine with matplotlib, straightforward in plotly (minus annotations, obviously).
drop line scatter (3d have one drop line per observation, trilinear have 3 and 2d have 2) - sometimes called "lollipop chart". - example https://www.advsofteng.com/doc/cdpydoc/threedscatter2.htm
drop line graph - for multiple series in a time series graph, each time interval observations are connected together (or experiments or groups or cohorts). Don't have an example handy that I can share right now.
dual Y axis graph. Yeah, old school, but probably easiest way to establish visual correlation between signals. This is going to be a bear to do this in plotly...
flow maps (ie. Minard). Don't know anything to do this with plotly or even python @fbahoken has this project: https://github.com/gflowiz/arabesque
four-fold chart (a sector chart to represent 4 categories. ex: https://openi.nlm.nih.gov/detailedresult?img=PMC3919735_pone.0088106.g002&req=4 - also possible to do with squares
fuzzygram - something I've come across reading Leland Wilkinson in the 90s, not seen much of it, but I am probably going to add this at some point to my stemgraphic
python package given how I am covering most every distribution related charts in it.
MCA and 3D MCA - Prince
(https://github.com/MaxHalford/prince) does some but not interactive
mosaic matrix (http://euclid.psych.yorku.ca/SCS/Papers/drew/drew.pdf)
pareto graph, for ex: https://github.com/tisimst/paretochart - but not updated in 8 years, not interactive
population pyramid (basically back to back histograms)
probability graph with distribution probability grid. Q-Q (usually normal distribution) and Q-V charts are specific examples of this, depending on cumulative frequency or expected value a X. ex: https://www.statsmodels.org/dev/generated/statsmodels.graphics.gofplots.qqplot.html
range bar graph (precursor to box-plot), some call this floating bar charts (https://peltiertech.com/floating-bars-excel-charts/)
Rootograms, hanging rootograms and suspended rootograms. Found in John Tukey's EDA (1977) and I've seen this in some scientific papers. Hanging rootogram in particular seems quite useful. Probably adding that to stemgraphic at some point.
sieve diagram. I don't remember what I wanted to say about this...
spectral plot, in continuous use in the literature for many decades (aka ridge contour or slice graph or even ridge plot) https://seaborn.pydata.org/examples/kde_ridgeplot.html
stem-and-leaf plots, stem-and-leaf heatmaps - hands down my favorite way to look at distribution, both at shape and detail level. Better to estimate distribution than box or violin plots. I ended up building my own package because there was nothing to do proper graphical representations of stem-and-leaf plots nor scale to very large data sets. http://stemgraphic.org/ Currently only matplotlib, but working on interactive version. Many of the other plots in stemgraphic already support plotly figures output.
trilinear (ternary) chart. Plotly express supports it for scatter: https://plotly.com/python-api-reference/generated/plotly.express.scatter_ternary.html and for line: https://plotly.com/python-api-reference/generated/plotly.express.line_ternary.html and they are for comparing 3 components of a whole (they have to add to 100% of what you want to look at...)
vector field graphs (2d and 3d) - plotly has a streamline figure that is basically it: https://plotly.com/python/streamline-plots/
Voronoi diagram. Plotly in R is possible: https://www.r-bloggers.com/2016/02/voronoi-diagrams-in-plotly-and-r/ so there should be a way to do this in python with plotly. Plenty of matplotlib implementations. Also this: https://chart-studio.plotly.com/~riddhiman/134.embed
And one more python library amd a few plot types I have to mention: yellowbrick maintained by @bbengfort, @rebeccabilbro et al. It doesn't output interactive figures, just matplotlib, but I've been experimenting with a similar approach I used with my Hotelling package to add support for plotly figs to YB. Has things like:
All useful in EDA, and a whole lot of other things that are much more ML focused.
Ok, done for now...
BTW, on spectral plot, the example I linked was seaborn, and it's a distribution plot (kde ridge) but most of the time spectral plots are not for looking at distributions, but at frequency spectrums
Worked on plotly this week, sent a PR to fix matplotlylib so it works with current matplotlib (4.3.1) to generate interactive plots. This will help to enable several plots in pandasgui @adamerose. Downside is I didn't make any progress on pandasgui this week.
Just found this very interesting site, posting here for future reference
I've started compiling a list of all the useful plots I could think of with examples and whether it is available in PandasGUI:
https://docs.google.com/spreadsheets/d/1ORw1GWTent5NJmIf23Co7EPTBBjNyC1VCHfi9yHwf7A/edit#gid=0
If you have any feedback or knowledge to contribute, feel free to share here