More improvements in the Describe > Tables dialogue

IDEMSInternational / R-Instat

A statistics software package powered by R

http://r-instat.org/

GNU General Public License v3.0

38 stars 103 forks source link

More improvements in the Describe > Tables dialogue #8236

Open rdstern opened 1 year ago

rdstern commented 1 year ago

@anastasia-mbithe and @lilyclements This is coming on well now that the reordering of the variables and the themes have been added.

I hope we can keep going on the improvements. Here are a few simple suggestions, plus some that might take a bit longer.

Here is a 4-way table with Ana's new Excel theme:

I really like that we have the Excel theme even though it is an example of a poor table.

Here is the dialogue for the table above, with Example 1 from the agriTutorial package - that you will need to install if you would like to use the same example.

Some simple suggestions first: a) Please check the order of the factors in the first receiver. I assumed that they would go in the order Replicate, then Management for the columns, followed by Nitrogen then Variety for the rows - that's to give the layout above, which corresponds to the textbook layout. I don't have a serious problem with your current order as long as there is a simple logic to explain it.

b) There is just a single summary. That's given as mean-yield in the table above. When there is only a single summary could we have an option to not give it at all? Or perhaps no option, but just don't display it? Or it becomes a default totle or footnote?

c) I think the checkbox to Display Outer Margins now displays all margins - which is great. So (if I am right), delete the word Outerfrom the label.

d) Similarly simplify Display Summary-Variables as Rows to Display Summaries as Rows.

Now how could I dictate that all data in the table above are shown to one decimal? I think that may be easy through the gtsummary package.

e) If so, then add gtsummary. Then I think it is style_number in that package.

f) Investigate adding the gtsummary theme - which seems to have some sub-themes and also permits the tables in different languages!

g) What else becomes easy once gtsummary is used?

lilyclements commented 1 year ago

@rdstern to a - The order of the factors is be coming in from the mmtable2 code.

We have two bits of code here - our summary_table code is first, and then the mmtable2 code is second.

# Code generated by the dialog, Frequency/Summary Tables
summary_table <- data_book$summary_table(data_name="rice",
                                         columns_to_summarise="yield",
                                         factors=c("management","Replicate","variety","nitrogen"),
                                         treat_columns_as_factor=FALSE, summaries=c("summary_mean"))
head(summary_table)

# A tibble: 6 x 6
  management Replicate variety nitrogen `summary-variable` value
  <fct>      <fct>     <fct>   <fct>    <chr>              <chr>
1 Minimum    R1        V1      0        mean__yield        3.32 
2 Minimum    R1        V1      50       mean__yield        3.19 
3 Minimum    R1        V1      80       mean__yield        5.47 
4 Minimum    R1        V1      110      mean__yield        4.25 
5 Minimum    R1        V1      140      mean__yield        3.13 
6 Minimum    R1        V2      0        mean__yield        6.1

The order in the summary_table data frame output follows the order that they are inputted into the code.

Then, for mmtable2, we again go in order that they were inputted into the code. We here run header_top_left and header_left_top. This (confusingly named function!) decides if the variable is placed as a column or row.

As we change the Column Factors nud, the mmtable2::header_top_left is given for the factors as we work from the first factor down to the last factor.

E.g., if we only have one column factor, then we have the first factor (management) as header_top_left. The rest are header_left_top.

last_table <- (mmtable2::mmtable(data=summary_table, cells=value) +
                 mmtable2::header_top_left(variable='summary-variable') +
                 mmtable2::header_top_left(variable=management) +
                 mmtable2::header_left_top(variable=Replicate) +
                 mmtable2::header_left_top(variable=variety) +
                 mmtable2::header_left_top(variable=nitrogen))

If we only have three column factors, then we have the first three factors (management, Replicate, variety) as header_top_left. The rest are header_left_top.

last_table <- (mmtable2::mmtable(data=summary_table, cells=value) +
                 mmtable2::header_top_left(variable='summary-variable') +
                 mmtable2::header_top_left(variable=management) +
                 mmtable2::header_top_left(variable=Replicate) +
                 mmtable2::header_top_left(variable=variety) +
                 mmtable2::header_left_top(variable=nitrogen))

Does this make sense? This is what is currently happening, but I'm open to any suggestions on order for where bits are placed.

lilyclements commented 1 year ago

@rdstern @anastasia-mbithe to b -

Good suggestion, and really simple to implement.

@anastasia-mbithe if we don't want to show something on a table, we just don't run that code. For the example @rdstern has given, this means we do not run mmtable2::header_top_left(variable='summary-variable') in the mmtable2 code.

So:

If treat_columns_as_factors = FALSE and we have only one summary and only one variable to summarise, we have only one factor level for summary-variable. This means that we do not want to run the line mmtable2::header_top_left(variable='summary-variable').
If treat_columns_as_factors = TRUE and we have only one summary, then we have only one factor level for summary, but still have multiple factor levels for columns_to_summarise. In this case, we do not run the line mmtable2::header_top_left(variable=summary).
If treat_columns_as_factors = TRUE and we have only one variable, then we have only one factor level for columns_to_summarise, but still have multiple factor levels for summary. In this case, we do not run the line mmtable2::header_top_left(variable=variable).
If treat_columns_as_factors = TRUE and we have only one summary and we have only one variable, then we have only one factor level for columns_to_summarise, and only have one factor level for summary. In this instance, we do not run the line mmtable2::header_top_left(variable=summary) + mmtable2::header_top_left(variable=variable).

# Example with Example 1 rice data that Roger used:

summary_table <- data_book$summary_table(data_name="rice",
                                         columns_to_summarise="yield",
                                         factors=c("management","Replicate","variety","nitrogen"),
                                         treat_columns_as_factor=FALSE, summaries=c("summary_mean"))

# Here, treat_columns_as_factor=FALSE and we have just one summary and column_to_summarise:
mmtable2::mmtable(data=summary_table, cells=value) +
                 mmtable2::header_top_left(variable=management) +
                 mmtable2::header_left_top(variable=Replicate) +
                 mmtable2::header_left_top(variable=variety) +
                 mmtable2::header_left_top(variable=nitrogen)

# That code above is what we want to run. We no longer want to run this:
mmtable2::mmtable(data=summary_table, cells=value) +
                 mmtable2::header_top_left(variable='summary-variable') +
                 mmtable2::header_top_left(variable=management) +
                 mmtable2::header_left_top(variable=Replicate) +
                 mmtable2::header_left_top(variable=variety) +
                 mmtable2::header_left_top(variable=nitrogen)

# If treat_columns_as_factor=TRUE and we have just one column_to_summarise, but we have multiple summaries, then we need to differentiate which summary is being run but not which column_to_summarise (since it is always the same column_to_summarise:
summary_table <- data_book$summary_table(data_name="rice",
                                         columns_to_summarise="yield",
                                         factors=c("management","Replicate","variety","nitrogen"),
                                         treat_columns_as_factor=TRUE, summaries=c("summary_mean", "summary_sum"))

last_table <- (mmtable2::mmtable(data=summary_table, cells=value) +
                 mmtable2::header_top_left(variable=summary) +    # We run = summary, but not = columns_to_summarise, because it will always be the same column_to_summarise
                 mmtable2::header_top_left(variable=management) +
                 mmtable2::header_left_top(variable=Replicate) +
                 mmtable2::header_left_top(variable=variety) +
                 mmtable2::header_left_top(variable=nitrogen))

Does this make sense?

lilyclements commented 1 year ago

e) signif_fig is a parameter in our summary_table function. However, I agree that we should have decimal places decided on the display end (i.e., mmtable2) not the calculation end (our function). This being said, mmtable2 sets all columns as characters. This has to be the case to have multiple column headers. As a result, it is not so simple to make these amendements. One option is that we save the summary_table object when we save the mmtable2 object. Then we can refer back to the mmtable2's corresponding summary_table object to make the changes. Logistically, how this impacts times, etc. Maybe a conversation for me to have with @dannyparsons

In the meantime, can we use the signif_fig parameter in our summary_table function?

rdstern commented 1 year ago

@lilyclements and @anastasia-mbithe I am still keen on continuing the improvements in the format table sub-dialogue. Here is an example:

And could we work towards being able to do all the tables shown here, by Thomas MocK? That's a really nice article. I'm really keen to be able to promote great tables as well as great graphs for the presentation of climatic summaries, by the time we give our e-INAM course in June?

And here is an excellent video, which shows what we could do in RStudio. How easily could we do this in a script window in R-Instat, and could we do all that he does, in the R-Instat sub-dialogue. I even wonder if we could use this video and example on Thanh's courses to illustrater how this all works in RStudio and we add the same in R-Instat? Should we include the palmerpenguins package in R-Instat. I am always looking for interesting datasets!

lilyclements commented 1 year ago

@rdstern adding groups and spanners in the first article shared looks really great. I assume this isn't a priority for this week, but is definitely something to work towards. If we are happy for this to be looked at in a later week, I can write an issue on it?

Column Amendments The following look like they are suitable for the columns tab:

cols_label - rename columns
cols_align - align the columns a certain way
cols_width - change the width of a column

We want to change multiple names within a column, where the number of columns that there are changes for different tables. This means we can't have a "fixed" number as easily. I see two options to this, but really would be open to more suggestions:

Use a grid like in the "Rename Columns" dialog under the "Multiple" tab. I think I will have to write some R code to be able to align the new name output with the grid, and I think we would have to make changes to the grid.
Have two inputs: One where the user puts in the current column name that they want to change, and another where they put in the new label. This is much simpler to implement, so might be a good current solution. We can look at "upgrading" this later.

Option 2 would then run something like this (from Stack Overflow)

label <- c("cylinder", "horsepower")   # value from the "new column name"
columns <- c("cyl", "hp")   # value from the "(current) column name"
cols_list = as.list(label) %>% purrr::set_names(columns)
mtcars %>%
  gt::gt() %>%
  gt::cols_label(.list = cols_list)

This can be the same for cols_align and cols_width.

Colouring columns We look at the data_color function described here. We can colour columns by their attributes (e.g. numerical), names, etc

This colouring can go further and the rows can be coloured within a column. E.g., "colour everything > 20 in this column as red, otherwise as green". Given this, data_color is something for columns and rows - so where would it fit? Since this is about the data values, so perhaps this fits in a third tab - a "data" tab. @rdstern what do you think? I might be overcomplicating it somewhat. Perhaps, for now, we have the options to colour a column by its name/attribute in the "Column" tab. In time, we can add additional options in a "data" tab.

Column Rearrangement There's a set of functions related to rearranging/editing the column positions. We could have these under the "Column" tab in their own box? However, we might want to leave these for now to give time to conceptualise it a bit further.

cols_merge_range, cols_merge_n_pct, cols_merge_uncert - merging columns together to get a range, n (pct), or +/- uncertainty range
cols_hide - can hide a column
cols_move - can move a column

lilyclements commented 1 year ago

@rdstern perhaps actually instead of a "data" tab, like I said for some colour options, we have a "conditional formatting" tab.

https://themockup.blog/static/resources/gt-cookbook.html#conditional-formatting

lilyclements commented 1 year ago

I'm going through the chapters in gt-cookbook and seeing what ties in with our tables - what do we have, what can we add. It fits in with our different tabs somewhat:

[ ] Save output - We (will) have this in our Use Table to convert to LaTeX, HTML, etc. One of the sprint aims.
[ ] Grouping and Summary Groups - We do already in our own system so we don't need to worry about here. There is groupname_col which fits in with the spanners part that would be great to include (see under "Create or Modify Parts").
[ ] Column Formatting - This is like our different axes options (scale_x_discrete, scale_x_date, etc) in ggplot2. We can look into introducing these for different column types at a later date. The difference here that is not in ggplot2 is that we have multiple columns of different types.
[ ] Create or Modify Parts - We do parts of this already, like creating headers. Adding spanners is something to do with the layout that I will explore (possibly after this sprint - it can be a "next sprint" task? Or sooner, once we have the Columns and Save Output bits sorted)
[x] Add Notes - Got this already in
[ ] Modify Columns - See two comments above where I talk about column tab (see #8450 as a start to this)
[x] Table Optimisation - We have this already - themes, font size, borders, etc.
[ ] Conditional Formatting - To add?

@rdstern if you are happy with this, I suggest these different "chapters" get their own issue - with groupname_col in Grouping and Summary Groups joining the Create or Modify Parts "issue". (Some already do have their own issue - like "Save Output").

rdstern commented 1 year ago

@lilyclements very happy with all this. Also with your simpler Option 2 above.

I get the impression that there might be a few bits of work that start - even finish? - in this sprint, but most will be a set of issues, some ready for work, and others provisional, and needing more thought.

I really like the idea of the conditional formatting as well as other colour options. They are there in Excel and it would be nice to match that - at least for some tables.

I would then hope that quite a bit can be included in the August release? However, the improvements to the similar plotting sub-dialog has taken to now from the start. So perhaps we will be looking at a longer time scale?

In parallel I still wonder whether putting summaries into a list column could also be considered and those sparkline columns might then be a really simple additional feature. We could supply some lists to test that out, if it is part of your list? (I know I am a bit paranoid about this feature. It is just part of my general argument that graphs and tables and now nowhere near as distinct as they used to be.

lilyclements commented 1 year ago

@rdstern thanks for this! I will spend some time today on looking into the sparkline columns to add them in.

rdstern commented 3 weeks ago

@lilyclements the new gt sub-dialog is now merged. That's in the new Describe > Tables > Presentation Table dialog - that's actually a Presentation of Data-frame dialog. (We are also adding the Table Options button to the Prepare > View Data dialog.)

Now the challenge is to re-instate the Table Options button it in the Describe > Tables > Summariesdialog. I hope that is going to be easy? I note there is also a gtsummary package. I assume we may later want another dialog that includes that, but we initially want to use our own summary system.

The complication, from @Patowhiz is that the gt code needs to draw on the data frame as well as the gt object. I hope that can just be our summary table data frame, but maybe not?

I hope that the gt object we save can also include the link to the summary data frame? In that case we could also make further use of the Table Options sub-dialog in the Use Table dialog? That would parallel the Use Graph dialog.

The Table Options button will also need to be re-instated into the Describe > One Variable > Summarise (Customised)and Describe > 2/3 Variables > Summarisedialogs. I assume we handle the general one first? (I say we, but am not sure what I can do here - it is the royal we!)