EmmaSchwarz / computational-dostoevsky

1 stars 0 forks source link

Visualization for "Intersecting Words" #28

Open blueciren opened 4 years ago

blueciren commented 4 years ago

David, I am sorry but could you help me create a visualization for the intersecting words? You suggest multiple variables and I tried to make a bar graph. Then, I realized both do not show "who spoke the word first," which could be important for further analysis. The line graphs would be best (x axis for chapter, y axis for frequency, each line indicating speakers), but in this case, I need to create several graphs, which will take a lot of time (I am not sure it is a sure thing to do this week) and in some cases of the words it would not be effective. For example, the word enemy is repeated by the five characters 41 times. But the word decorum is repeated by the narrator and Goliadkin respectively 26 and 15 times. The word "dear" is repeated only once by the double. In case I put those into one single bar graph, the gap of the frequency of words is large (from 1 to 26).

djbpitt commented 4 years ago

@blueciren I’ll be happy to help, but if I’ve understood correctly, you are asking first (and correctly so, since it doesn’t make sense to start coding until you know, at least preliminarily, what you want to produce) about selecting a visualization that is appropriate for the data and “the story you want to tell”. Let’s Zoom about this; send me a Kakao message and we can arrange a time today.

Creating several graphs may not require much more development time than creating one. The reason is that once you have the code that creates one graph, you can run a loop in which you change just the word you are tracking (you can make a list of those and loop over them), so the only additional code involves creating the loop. The main routine, which creates the graph, will be reused, each time with a different word, so you don’t need to write entirely different code for each graph.

If you want to create multiple graphs, though, you have a few options about how to structure the output. You could create each of them as a stand-alone SVG file, since XSLT can create multiple output files with the <xsl:result-document> element. Of you could create one SVG output that includes multiple graphs, in which case you have to think about how you would like them arranged. (You can adjust the arrangement later with CSS, but especially if you have a lot of them, arranging them the way you want them to appear when you create them will save you time debugging the CSS positioning.) If you care about the exact values, you need the graphs to be large enough for the labels on the axes to show. If you just want to contrast the trends, though, you could use small multiples, as in the small-multiple line graphs I showed you from my XML Prague paper. Small multiples are underused because we’re afraid not to label our axes, and in general labeling axes is important, but when you care only about the trends, and not the values, and when comparison is important, small multiples can be a good choice. See https://en.wikipedia.org/wiki/Small_multiple and https://medium.com/nightingale/getting-started-with-small-multiples-an-underused-but-powerful-form-of-data-viz-3e0a8f8139dc.

I don’t know whether the following is helpful, but I wonder whether you might care not only about which words are said by one character and then repeated by another, but also about aggregated trends, that is, which characters are more likely to echo keywords by which other characters. If so, that sounds like a summary report, and you would have to decide how to quantify the information. For example, in addition to who says a keyword first, there also may be questions about how many times each character says the word, how widely separated the uses are, how concentrated they are, whether the uses of the same word by different characters are interwoven or separated, etc. I don’t yet know how to think about how to visualize this because there are so many decisions to make first about which data is relevant, and perhaps this isn’t a question you want to ask anyway, but if this sounds like something you might want to do, we can talk about it when we Zoom.