explorable-viz / fluid

Data-linked visualisations
http://f.luid.org
MIT License
36 stars 3 forks source link

`Text` visualisation type #1035

Closed RaoOfPhysics closed 2 months ago

RaoOfPhysics commented 3 months ago

See also:


@rolyp and I have had a few thoughts on this, and I’m attempting here to note some of those in a way that hopefully makes sense.

This was partly inspired by my doctoral thesis (a-ch.in/ty-a#research), where, rather than write numbers by hand, I “injected” the results of calculations straight into the text. For example, the source R Markdown file (.Rmd) in question had the following text+code:

The gender breakdown of the respondents[^gender] is listed in Table \@ref(tab:gender-breakdown), together with that for the total number of scientists in the CMS collaboration.[^missing-gender]
Therefore, `r round(cms_male_response_count/cms_total_male*100,2)`% of all male CMS scientists and `r round(cms_female_response_count/cms_total_female*100,2)`% of all female CMS scientists responded to the survey.
The differences in the response rates are significant, as determined by the Chi-squared test without the Yates’s correction (_p_-value: `r round(prop$p.value,4)`; 95% CI: `r round(prop$conf.int,4)`).

[^gender]: At the time of collecting these data, CMS only allowed its members the choice of one of two genders, based on their passports or national ID cards.
I am not aware if this policy has changed since them.

[^missing-gender]: One of the scientists had an empty gender field in the database.

The output looks like this:

Screenshot 2024-07-19 at 15-09-56 Particle physics and public engagement a match made in minuscule matter - CERN-THESIS-2022-306 pdf

In another instance, I had three statistical tests run thrice to confirm if the dataset in question was suitable for the intended analysis. Similar text would need to explain this each time. Rather than write the sentences and numbers by hand, I wrote a set of functions in R, one for each test, which would output the appropriate text whether the test passed or failed, and would include the calculated numbers corresponding to the test:

# ====== Return statements for EFA ====== #

## KMO

report_kmo <- function(category) {
    structure <- parameters::check_factorstructure(category)
    KMO <- structure$KMO$MSA %>% round(2)
    if (KMO < 0.5) {
        glue::glue(“The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that factor analysis is likely to be inappropriate (KMO = {KMO}).”)
    } else {
        glue::glue(“The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that the data seem appropriate for factor analysis (KMO = {KMO}).”)
    }
}

## Bartlett’s test for sphericity

report_sphericity <- function(category) {
    structure <- parameters::check_factorstructure(category)
    chisq <- structure$sphericity$chisq %>% round(2)
    dof <- structure$sphericity$dof
    p_val <- structure$sphericity$p
    p_formatted <- insight::format_p(p_val)
    if (p_val < 0.001) {
        glue::glue(“Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis ($\\chi$^2^({dof}) = {chisq}, {p_formatted}).”)
    } else {
        glue::glue(“Bartlett’s test of sphericity suggests that there is not enough significant correlation in the data for factor analysis ($\\chi$^2^({dof}) = {chisq}, {p_formatted}).”)
    }
}

## Cronbach’s alpha

report_alpha <- function(category, iterations = 50) {
    set.seed(19480717)
    alpha_all <- psych::alpha(category, n.iter = iterations, check.keys = TRUE)
    alpha_df <- alpha_all$boot.ci %>% as_tibble()
    alpha_lower <- alpha_df[1,1] %>% round(2)
    alpha_median <- alpha_df[2,1] %>% round(2)
    alpha_upper <- alpha_df[3,1] %>% round(2)
    glue::glue(“Cronbach’s $\\alpha$, based on {iterations} iterations, is {alpha_median} (lower: {alpha_lower}; upper: {alpha_upper}).”)
}

Then, wherever I had to include the results, I added the following to the R Markdown file (.Rmd) of the analysis chapter:

```{r benefits-checks}
#| cache: TRUE
kmo_benefits <- report_kmo(benefits)
sphericity_benefits <- report_sphericity(benefits)
alpha_benefits <- report_alpha(benefits)
  1. r kmo_benefits
  2. r sphericity_benefits
  3. r alpha_benefits

The output of that particular bit of code appears in the PDF as follows: Screenshot 2024-07-19 at 14-57-33 Particle physics and public engagement a match made in minuscule matter - CERN-THESIS-2022-306 pdf

(Other instances are on p. 82 and p. 86.)


What we are proposing is the following:

  1. A “text” type of “visualisation” that can be inserted into prose, as shown by the first example above. So, you could have backticks with the keyword fluid with the “visualisation” type and parameters inserted there, which will be converted into the calculated figure in the displayed HTML file: `fluid \<some code>`. This will need a bit of fluid code similar to the R package nombre, which converts from numbers to text and vice versa: https://nombre.rossellhayes.com/.
  2. The “text” visualisation needs to be linked to other visualisations where relevant, and when clicking on a data point in a visualisation, it should highlight all of the relevant text sentences/paragraphs (possibly using <mark>). When selecting specific points in the visualisation, the text should adapt itself based on the underlying data. This can be used for example for dynamic captions, but also for longer paragraphs.
    • A related idea is to have linked visualisations “appear” in the sidebar, so users don’t have to scroll back and forth to and from the relevant image.
RaoOfPhysics commented 3 months ago

Related to: https://github.com/explorable-viz/fluid-examples/discussions/10

RaoOfPhysics commented 3 months ago

Use AI instead of hand-crafting prose: https://github.com/explorable-viz/research-strategy/issues/165

rolyp commented 3 months ago

@JosephBond This is the new visualisation type we’ll need to add – I think it can be based on similar principles to our existing viz types, but simpler. Initially it won’t be easy to use, as you’ll need to have a specific uniquely named div for each Text element, but hopefully there will be some relatively easy things we can do to improve this.

JosephBond commented 3 months ago

Going to sketch some of my design ideas here over today and tomorrow.

JosephBond commented 3 months ago

Whilst in theory this is "simpler" than other visualizations, it's actually not simpler at all. We need some sort of polymorphism in the DisplayText type, or to at some point get all values that are present, and convert them into strings. It's entirely unclear where to do this because the way we pack/unpack types is really opaque

RaoOfPhysics commented 3 months ago

From my private message to @JosephBond:


OK, so here’s my understanding. The inline chunks on their own are a bit more limited than what we’re trying to do. Most of the text is pre-written with the exception of what is being injected between the inline backticks.

I think these are the right bits of relevant code:

Not exactly sure how it works, but here is some explanation in the R Markdown book: https://bookdown.org/yihui/rmarkdown/r-code.html

Also worth poking at the Knitr book: https://yihui.org/knitr/

rolyp commented 3 months ago

@JosephBond I’ve had a very quick scan of the above. We can chat more later today, but perhaps we should conceptually separate “atomic visual elements” (visualisations and fragments of text) from the top-level document structure (which might be a sequence of such things, in a literate programming sort of style). For now, I think we can concentrate on the former. We already have visualisations which are computed from data, so we “just” need to add text computed from data. Then we can insert those text elements into existing HTML documents, just as we currently do with visualisations.

I think your List (Text + Val) type above is closer to the latter, and would allow the entire content of a web page to be a rendered Fluid value containing both text and graphics (the literate programming “document”). That’s an important perspective to keep in mind, but hopefully we can avoid having to design it right now.

rolyp commented 3 months ago

As an example adapted from @RaoOfPhysics‘s report_sphericity code, suppose the index.html contained:

<p>Bartlett’s test of sphericity suggests that there is <div id="sig-123">?</div> correlation in ... 

We could then (naively) compute a Text element in Fluid using the following library function:

let sufficient n threshold =
    Text(if n >= threshold then "sufficient" else "insufficient")

or even:

let sufficient n threshold =
    let s = "sufficient" in Text(if n >= threshold then s else "in" ++ s)

which when plugged into the div with id sig-123, would be rendered as (selectable) text to create the final HTML.

rolyp commented 3 months ago

@JosephBond Started capturing a few subtasks in the issue body.

rolyp commented 3 months ago

@JosephBond Have extracted Examples to a new task, to give us a place to capture some initial ideas..