vertical visualisation: is this something?

mdingemanse commented 1 year ago

Craig, P., & Roa-Seïler, N. (2012). A Vertical Timeline Visualization for the Exploratory Analysis of Dialogue Data. 2012 16th International Conference on Information Visualisation, 68–73. doi: 10.1109/IV.2012.22

mdingemanse commented 1 year ago

Okay so this is not hard at all given talkr formatted data. Quick and dirty attempt in this commit — note that I'm branching off the unmerged branch of #12 to do this with the IFADV data):

source("R/convplot.R")
library(tidyverse,ggthemes,ggrepel)
load("data/ifadv.rda")

extract <- convplot(data=ifadv,uid="dutch-04-091-125268", before=0,after=30000,datamode=T)

extract |>
  ggplot(aes(x=participant_int,label=utterance,colour=participant,fill=participant)) +
  ggthemes::theme_tufte() +
  theme(legend.position = "none",
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank())+
  coord_cartesian(xlim=c(0,10)) +
  labs(y="time (ms)",x="") +
  scale_y_reverse() +
  geom_rect(aes(ymin=begin0,ymax=end0,xmin=participant_int-0.4,xmax=participant_int+0.4)) +
  ggrepel::geom_text_repel(aes(y=begin0,x=participant_int),
            hjust=0,direction="y",nudge_x= ifelse(extract$participant_int==1,2,1))

Basically, we map time to y and participant to x, then add the utterances with a label argument and use ggrepel::geom_text_repel to make them not overlap too much. Lots of room for improvement (font, size, spacing, colours, etc.) but the idea is clear.

Missing relative to the example above: overall view of where in the conversation the plotted part is.

mdingemanse commented 1 year ago

Other notes and ideas:

don't use lines but index numbers to establish the relation between text and annotation (= closer to line numbers in CA transcripts)
can easily map utterance_raw to label for corpora that have more CA-like transcripts
use outlines on geom_rect() to more clearly show multi-unit turns

Questions:

How to make useful timing indicators? This is the worst part of the Dialogue Explorer example — precise timecodes are rarely useful. CA has a better convention of marking only pauses >200ms in a format like (0.2). Would need to add units of 'silence' to do this though (hacky: a participant 'silence' whose only annotations are silences >200ms, with the CA-formatted pause length in utterance?)

elpaco-escience / talkr

vertical visualisation: is this something? #11