ebeshero / DHClass-Hub

a repository to help introduce and orient students to the GitHub collaboration environment, and to support DH classes.
GNU Affero General Public License v3.0
27 stars 27 forks source link

Graphing and Validity: Discussion Assignment for Wed. 1/27 #113

Closed RJP43 closed 8 years ago

RJP43 commented 8 years ago

The Assignment

(reposted by Dr. B, who also repaired the links!) @RJP43 @brookestewart @nlottig94 @spadafour @mmm202 @jlm323 @laurenmcguigan @mjb232 It's much too easy to lie with numbers and graphs on digital humanities projects. Let's face it, we are Humanists and we don't do numbers. We will be posting links here of the good and the bad with graphing and using digital visualizations to share qualitative information on DH projects. Please begin by reading "A short tour of bad graphs" Read this carefully, think about the advice given about graphs, and look carefully at the examples and discussion of problematic visualizations of data.

Discussion points:

To start this out, here is an example of a bad graph on Nelson 1.0, from Fall 2014. What is bad about this graph, and why?

brookestewart commented 8 years ago

And here's an example of the graph from the Dickinson Project.

nlottig94 commented 8 years ago

Here is an example of a validity issue in a large representation of a network of data regarding the network of relationships within Six Degrees of Francis Bacon.

laurenmcguigan commented 8 years ago

I just have a question! Can someone explain to me what it means to adjust dollar amounts in a graph for inflation? Thank you!

laurenmcguigan commented 8 years ago

http://nelson.newtfire.org/DaubeSVG.html

This is the graph from the Nelson project. I found a few things wrong with it, let me know if you agree or disagree or if you have anything to add.

laurenmcguigan commented 8 years ago

piechart

CHECK OUT THIS PIE CHART. Thought you all might enjoy it.

mmm202 commented 8 years ago

@laurenmcguigan I agree with you about the nelson project amount of occurrences. It seems like there may need to be more values because the graphs don't seem to go much higher than 12.5. You can see that changes are happening in the graph but it is hard to tell exactly what those changes are and to compare them. I believe that your pie chart is lacking some data since there is nothing in it! It made my day though.

mmm202 commented 8 years ago

capture This graph I believe has way too much information on it. Can anyone actually tell what it is even about??

mmm202 commented 8 years ago

image This graph has the information needed to make it easily readable. Does anyone see any flaws?

ghbondar commented 8 years ago

@laurenmcguigan Inflation here refers to the decrease in the value of the US Dollar over time. For example: a comfortable new car that costs $25,000-30,000 today might have cost around $3,000 back in 1973, more or less. So, usually, it is safe to assume that money in the past was more valuable than the equivalent amount of money today. For most of the past 20-years, inflation has been very low (prices increase 1-2% per year), but before then, inflation was much for significant (more than 5% per year). So, in short, to compare dollar amounts over time, changes in value due to inflation most be accounted for.

jlm323 commented 8 years ago

@mmm202 The second graph you posted is a lot easier to read. The first one is hard to conceptualize and there are too many graphics that get in the way of actually understanding what it's trying to say. I think it is trying to show how many different countries goes into producing Starbucks coffee and also how many sales fast food restaurants get but it's just too confusing because there's too much there.

The second graph, while it's easier to read, doesn't have a title. Also the silhouettes of the children in the background might be distracting to some. I don't think it's conducive to the understanding of the graph. The intervals on the x-axis don't seem to increase like they should. They sort of go up at random numbers.

laurenmcguigan commented 8 years ago

graph_bad_7_0

jlm323 commented 8 years ago

exampleofabadgraphpiechart

The data in this graph would have been better suited to a line graph because this information deals with land use over time. Looking at it in two separate pie charts makes it harder to see the differences. Also, there are no values for any of the categories, forcing people to try to make sense of the angles of each piece of the pie chart.

nlottig94 commented 8 years ago

It's okay @laurenmcguigan! I'll pick another! image This graph is bad because it has WAY too much information on it, which makes it hard to read. Also, we know that these are individual grades, but we don't know what the grades are for or who each individual is. I also don't understand why the y-axis goes the whole way to 120%... This could be fixed by creating an average of all the grades. If the individual grades did need to be graphed, however, it would look better if the grades were just plotted as points rather than bars.

laurenmcguigan commented 8 years ago

badgraph

Here are two more examples of bad graphs. The way they use the bar to represent infections makes no sense. If they are stacked on top of each other does that mean there are more or less. For example, in less developed regions, the size of HPV and Hepatitus B/C virus bars are the same, but does one have more than the other because it is higher up on the scale?

The graph also does not have a title, and it may have been better as a line graph. Does anyone have any suggestions as to what would make this graph better? Do you think it could be a line graph, or would it be better as two separate bar graphs?

nlottig94 commented 8 years ago

image This graph is a very bad graph because it does not put anything into perspective. It looks more like a board game! It does have a legend, though, which makes it a little better than the last graph I posted. Also, the random graph-like lines makes the graph look even worse!

laurenmcguigan commented 8 years ago

@jlm323 I agree with you on the "land use" chart! I also think it would be better to add percentages or numbers to the pieces to show how many there really are, or how many there are in comparison to one another. I think a line graph would solve that problem by adding the amount or percentage on the graph.

jlm323 commented 8 years ago

misleading-graph-2

The small graph on the right is misleading because it looks like The Times have much higher sales than Daily Telegraph, but really it is the scale of the graph that makes it appear to be so different. The Times had 485k sales while Daily Telegraph had 446k which is only a 39k difference.

ebeshero commented 8 years ago

@jlm323 That's not just a bad graph (on sales of The Times vs The Telegraph)--it's a deliberately misleading visualization! By just slightly altering the width of one bar the publishers make it look to our eyes a lot shorter. That's lying with a picture, and it's surprising how often we see such ploys. (Great example!)

ebeshero commented 8 years ago

@laurenmcguigan hmmm. I'm looking at the stacked bar graphs of the various kinds of infection in less developed vs. more developed regions. I'm not sure it's misleading because we are looking at a number of infections on the y axis, and stacking the bars helps us to see how much of each kind of disease is part of the total number of infections. There's a convention in stacking bar graphs to stack the bars with the largest proportions on the bottom and the smallest ones on top, with the idea of helping the eye to compare relative proportions. I see your point that if you were to compare a total number of just the human papilloma virus between undeveloped vs. developed regions it's a little hard to compare, so my suggestion here might be to output a number of outbreaks of each kind inside the bar. Alternatively, you might set the percentage value of human papilloma in relation to each bar's stacked total if comparing the relative proportion of total outbreaks actually matters more than the literal number. I could see uses for each approach (actual number of infections of X kind in undeveloped regions, and percentage of X type of outbreak to the total in undeveloped regions).

But I do think a stacked bar graph is appropriate for showing proportion of a total. I am not sure you want a line graph, do you? Line graphs are good for expressing trends over time, so it would make sense to plot a line graph showing a reduction or increase in outbreaks in undeveloped regions over, say, a 50-year period. How were you thinking of plotting a line graph of the data here? (I might be missing something...)

spadafour commented 8 years ago

Yeah, that Nelson graph is BAD. First off, the underlying data behind the graph makes the entire thing disingenuous; the idea was to tag things like positive, negative, sarcastic, etc. However, as @laurenmcguigan sort of pointed out, the coding was entirely subjective; the graph pulls in data from multiple articles coded by different people. Results would be different every time another person coded a document. Attempting to compare the books to the articles offers no real tangible insight.

Also noted was the poor labeling. Notice that most bars don't even reach the halfway point on the graph, meaning that it is basically a guess to the number of occurrences for each tag. Also, the halfway value for the number of occurrences was 12.5; how can we count half of an occurrence?

(sorry, @RJP43 ! Did you make that, or did Shane?)

spadafour commented 8 years ago

@mmm202 the average weight of an American female child is odd. The choice to section weight off into intervals of 6 (which isn't entirely consistent; one of the intervals is 7) is hard to read; why not section it off into intervals of 5?

spadafour commented 8 years ago

Here's a subreddit dedicated entirely to bad data visualizations: https://www.reddit.com/r/dataisugly/#page=1

spadafour commented 8 years ago

Here are some terrible Fox News examples! 0701-ff_unemploymentchart

They picked three arbitrary months and connected the unemployments rate together with a straight red line. The red line is nothing but intentionally misleading and represents nothing, but it is meant to make you think that unemployment has been on a steady incline. The months are not even equidistant.

bush_cuts2

Here, they truncated the y-axis, making the disparity between both bars significantly exaggerated. In reality, one bar represents 35, and the other 39.6%, a difference of only 4.6%.

brookestewart commented 8 years ago

chartjunk Okay, I don't know who could have thought that this pie chart was a good idea. First, there's no title, so you can't even tell what it's about, or if it's even measuring anything. Second, it's REALLY hard to see where the actual sections are. The 3D aspect is very poorly done and would be better off if left 2D. Third, there is way too much information here, it's hard to tell what goes where. I'm sure there's more wrong with this, but those are the main problems I found.

brookestewart commented 8 years ago

dob This graph is also pretty confusing. The stacked bars are difficult to make sense of since the lighter bar isn't being measured from 0. You can easily compare it to the darker bar, but it would be difficult to figure out the exact number associated with the lighter bars. There's also quite a lot of information on the graph, which makes it hard to read. Also, I don't understand why there are gaps in the bottom two bars. The graph is just overall confusing to me. However, I do like how they did the background color - it makes it easy to see the overall level of water availability with the combined resources.

brookestewart commented 8 years ago

f2 large_-1024x684 Bonus! This is so awful. Someone help!

RJP43 commented 8 years ago

Wow, some of these are absolutely horrible.
In response to @spadafour the Nelson 1.0 graph had both Shane's and my hands on it. I did the querying of the data, while he produced the graph (the javascript and svg). Now I will say, in our defense, we were working under rough circumstances and the Nelson project was in its first year of development. It took spanning out further and adding more editors for the project to move away from the original subjective nature of the markup. Also, the successful javaScript was an accomplishment in itself. Finally, the graph was made with the intention of future project development to not only finish the titles and explanation of the graph, but to also provide more articles and chapters. Having more data would have boost the number of occurrences and if I remember correctly Shane chose that max number to account for when the additional data would be added.
There is no doubt though that the graph is flawed immensely. Surely with more experience under our belts the Nelson team can produce better graphs.. right? To answer that I would say yes and no. So yes we won't make the same mistake twice with the subjective markup and lack of labeling; however, as the project is still in development sometimes information that seems clear cut at the time of creating a visualization turns out not to be in the future. Let's take for example another visualization out of the Nelson project (this one from last semester). The following image is a table represented in one of the Nelson project texts that displays survey information: screenshot_1 There are several parts of this table that are confusing. The large amount of information clustered into such a small table makes it difficult to read and understand. So, last semester we worked to better this table by turning into a graph. As the semester went on we came to realize that our underlying markup to create the graph was skewed because there was a portion of the table (the numbering system in the original table referring to the order questions would be asked or skipped pending on participants' answers) that was confusing and left out. This is a good example of how data can be skewed during the digitization process. In one sense, the new graph was a better representation of the data because it became clearer; yet, the data is also incomplete until the question order is included in the new graph.

RJP43 commented 8 years ago

badgraph2 badgraph3

pictures of little animals and candles do not act well as bar graphs the sizing of the different little images is inconsistent the purpose of the graphs are unclear the little images are distracting do the number of images in a single column account for something? and what is the y-axis numbering/measuring? individual sales seems unlikely due to the low numbers?

def. examples of bad graphs!

ghbondar commented 8 years ago

:-0= augh! Those sure are some terrible graphics, guys... nice job! Some observations in no particular order:

mjb232 commented 8 years ago

image So for this first graph there's a whole mess of issues here which mainly deal with bad UX 1) there are no labels! What exactly are we looking at? 2) Looking in the center is disorienting 3) They use very similar reds for two different pie charts (top right and bottom center) 4) Since the graph is so busy, the tiny green sliver in the top right is sort of obscured

While the last graph was more confusion and hard to look at, this one is less busy and more simplistic. However, this new graph struggles to convey it's actual data: image 1) The actual averages for each player is unclear, for Emily's seems to be just above the 130 line, Hilde's seems to be somewhere in the middle of 110-115, and the inaccuracy of the previous two bars cause us to doubt the accuracy of Diana's which seems to lie on 120 2) The Y axis should be labeled with some kind of title or unit, especially for those who know nothing about bowling (like me). Diana has 120 what? Points? Pins? Shots?

Pomilui commented 6 years ago

Here is one of those 3-D Pie Charts (ugh) where the viewer has to read the percentages on the side and try to compare them with the area of the slice. The 3-D effect makes it seem that the smaller parts won't fit into a circle if all of the slices are added together.
ted talk This is a particularly misleading one with 2 y-axes. 2 different graphs are required with the corresponding lines if you want to show any kind of relationship between unemployment and being medically uninsured. Some quick math that shows how super-imposing these these two lines on one chart is misleading is that, say, the U.S. has 350 million people as a population. If the unemployment rate is 5.5% or 0.055, then the line for unemployment would be 19,250,000, far below the line of medically uninsured in the 40 million range.
insurance