Open rdstern opened 1 year ago
@rdstern to a -
The order of the factors is be coming in from the mmtable2
code.
We have two bits of code here - our summary_table
code is first, and then the mmtable2
code is second.
# Code generated by the dialog, Frequency/Summary Tables
summary_table <- data_book$summary_table(data_name="rice",
columns_to_summarise="yield",
factors=c("management","Replicate","variety","nitrogen"),
treat_columns_as_factor=FALSE, summaries=c("summary_mean"))
head(summary_table)
# A tibble: 6 x 6
management Replicate variety nitrogen `summary-variable` value
<fct> <fct> <fct> <fct> <chr> <chr>
1 Minimum R1 V1 0 mean__yield 3.32
2 Minimum R1 V1 50 mean__yield 3.19
3 Minimum R1 V1 80 mean__yield 5.47
4 Minimum R1 V1 110 mean__yield 4.25
5 Minimum R1 V1 140 mean__yield 3.13
6 Minimum R1 V2 0 mean__yield 6.1
The order in the summary_table
data frame output follows the order that they are inputted into the code.
Then, for mmtable2
, we again go in order that they were inputted into the code.
We here run header_top_left
and header_left_top
. This (confusingly named function!) decides if the variable is placed as a column or row.
As we change the Column Factors
nud, the mmtable2::header_top_left
is given for the factors as we work from the first factor down to the last factor.
E.g., if we only have one column factor, then we have the first factor (management) as header_top_left. The rest are header_left_top
.
last_table <- (mmtable2::mmtable(data=summary_table, cells=value) +
mmtable2::header_top_left(variable='summary-variable') +
mmtable2::header_top_left(variable=management) +
mmtable2::header_left_top(variable=Replicate) +
mmtable2::header_left_top(variable=variety) +
mmtable2::header_left_top(variable=nitrogen))
If we only have three column factors, then we have the first three factors (management, Replicate, variety) as header_top_left. The rest are header_left_top
.
last_table <- (mmtable2::mmtable(data=summary_table, cells=value) +
mmtable2::header_top_left(variable='summary-variable') +
mmtable2::header_top_left(variable=management) +
mmtable2::header_top_left(variable=Replicate) +
mmtable2::header_top_left(variable=variety) +
mmtable2::header_left_top(variable=nitrogen))
Does this make sense? This is what is currently happening, but I'm open to any suggestions on order for where bits are placed.
@rdstern @anastasia-mbithe to b -
Good suggestion, and really simple to implement.
@anastasia-mbithe if we don't want to show something on a table, we just don't run that code. For the example @rdstern has given, this means we do not run mmtable2::header_top_left(variable='summary-variable')
in the mmtable2 code.
So:
treat_columns_as_factors = FALSE
and we have only one summary and only one variable to summarise, we have only one factor level for summary-variable
. This means that we do not want to run the line mmtable2::header_top_left(variable='summary-variable')
.treat_columns_as_factors = TRUE
and we have only one summary, then we have only one factor level for summary
, but still have multiple factor levels for columns_to_summarise
. In this case, we do not run the line mmtable2::header_top_left(variable=summary)
.treat_columns_as_factors = TRUE
and we have only one variable, then we have only one factor level for columns_to_summarise
, but still have multiple factor levels for summary
. In this case, we do not run the line mmtable2::header_top_left(variable=variable)
.treat_columns_as_factors = TRUE
and we have only one summary and we have only one variable, then we have only one factor level for columns_to_summarise
, and only have one factor level for summary
. In this instance, we do not run the line mmtable2::header_top_left(variable=summary) + mmtable2::header_top_left(variable=variable)
.# Example with Example 1 rice data that Roger used:
summary_table <- data_book$summary_table(data_name="rice",
columns_to_summarise="yield",
factors=c("management","Replicate","variety","nitrogen"),
treat_columns_as_factor=FALSE, summaries=c("summary_mean"))
# Here, treat_columns_as_factor=FALSE and we have just one summary and column_to_summarise:
mmtable2::mmtable(data=summary_table, cells=value) +
mmtable2::header_top_left(variable=management) +
mmtable2::header_left_top(variable=Replicate) +
mmtable2::header_left_top(variable=variety) +
mmtable2::header_left_top(variable=nitrogen)
# That code above is what we want to run. We no longer want to run this:
mmtable2::mmtable(data=summary_table, cells=value) +
mmtable2::header_top_left(variable='summary-variable') +
mmtable2::header_top_left(variable=management) +
mmtable2::header_left_top(variable=Replicate) +
mmtable2::header_left_top(variable=variety) +
mmtable2::header_left_top(variable=nitrogen)
# If treat_columns_as_factor=TRUE and we have just one column_to_summarise, but we have multiple summaries, then we need to differentiate which summary is being run but not which column_to_summarise (since it is always the same column_to_summarise:
summary_table <- data_book$summary_table(data_name="rice",
columns_to_summarise="yield",
factors=c("management","Replicate","variety","nitrogen"),
treat_columns_as_factor=TRUE, summaries=c("summary_mean", "summary_sum"))
last_table <- (mmtable2::mmtable(data=summary_table, cells=value) +
mmtable2::header_top_left(variable=summary) + # We run = summary, but not = columns_to_summarise, because it will always be the same column_to_summarise
mmtable2::header_top_left(variable=management) +
mmtable2::header_left_top(variable=Replicate) +
mmtable2::header_left_top(variable=variety) +
mmtable2::header_left_top(variable=nitrogen))
Does this make sense?
e) signif_fig
is a parameter in our summary_table
function. However, I agree that we should have decimal places decided on the display end (i.e., mmtable2) not the calculation end (our function).
This being said, mmtable2
sets all columns as characters. This has to be the case to have multiple column headers. As a result, it is not so simple to make these amendements.
One option is that we save the summary_table
object when we save the mmtable2
object. Then we can refer back to the mmtable2
's corresponding summary_table
object to make the changes. Logistically, how this impacts times, etc. Maybe a conversation for me to have with @dannyparsons
In the meantime, can we use the signif_fig
parameter in our summary_table
function?
@lilyclements and @anastasia-mbithe I am still keen on continuing the improvements in the format table sub-dialogue. Here is an example:
And could we work towards being able to do all the tables shown here, by Thomas MocK? That's a really nice article. I'm really keen to be able to promote great tables as well as great graphs for the presentation of climatic summaries, by the time we give our e-INAM course in June?
And here is an excellent video, which shows what we could do in RStudio. How easily could we do this in a script window in R-Instat, and could we do all that he does, in the R-Instat sub-dialogue. I even wonder if we could use this video and example on Thanh's courses to illustrater how this all works in RStudio and we add the same in R-Instat? Should we include the palmerpenguins package in R-Instat. I am always looking for interesting datasets!
@rdstern adding groups and spanners in the first article shared looks really great. I assume this isn't a priority for this week, but is definitely something to work towards. If we are happy for this to be looked at in a later week, I can write an issue on it?
Column Amendments The following look like they are suitable for the columns tab:
cols_label
- rename columnscols_align
- align the columns a certain waycols_width
- change the width of a columnWe want to change multiple names within a column, where the number of columns that there are changes for different tables. This means we can't have a "fixed" number as easily. I see two options to this, but really would be open to more suggestions:
Option 2 would then run something like this (from Stack Overflow)
label <- c("cylinder", "horsepower") # value from the "new column name"
columns <- c("cyl", "hp") # value from the "(current) column name"
cols_list = as.list(label) %>% purrr::set_names(columns)
mtcars %>%
gt::gt() %>%
gt::cols_label(.list = cols_list)
This can be the same for cols_align
and cols_width
.
Colouring columns
We look at the data_color
function described here.
We can colour columns by their attributes (e.g. numerical), names, etc
This colouring can go further and the rows can be coloured within a column. E.g., "colour everything > 20 in this column as red, otherwise as green". Given this, data_color
is something for columns and rows - so where would it fit? Since this is about the data values, so perhaps this fits in a third tab - a "data" tab.
@rdstern what do you think? I might be overcomplicating it somewhat.
Perhaps, for now, we have the options to colour a column by its name/attribute in the "Column" tab. In time, we can add additional options in a "data" tab.
Column Rearrangement There's a set of functions related to rearranging/editing the column positions. We could have these under the "Column" tab in their own box? However, we might want to leave these for now to give time to conceptualise it a bit further.
cols_merge_range
, cols_merge_n_pct
, cols_merge_uncert
- merging columns together to get a range, n (pct), or +/- uncertainty rangecols_hide
- can hide a columncols_move
- can move a column@rdstern perhaps actually instead of a "data" tab, like I said for some colour options, we have a "conditional formatting" tab.
https://themockup.blog/static/resources/gt-cookbook.html#conditional-formatting
I'm going through the chapters in gt-cookbook and seeing what ties in with our tables - what do we have, what can we add. It fits in with our different tabs somewhat:
groupname_col
which fits in with the spanners part that would be great to include (see under "Create or Modify Parts").scale_x_discrete
, scale_x_date
, etc) in ggplot2
. We can look into introducing these for different column types at a later date. The difference here that is not in ggplot2 is that we have multiple columns of different types.@rdstern if you are happy with this, I suggest these different "chapters" get their own issue - with groupname_col
in Grouping and Summary Groups joining the Create or Modify Parts "issue". (Some already do have their own issue - like "Save Output").
@lilyclements very happy with all this. Also with your simpler Option 2 above.
I get the impression that there might be a few bits of work that start - even finish? - in this sprint, but most will be a set of issues, some ready for work, and others provisional, and needing more thought.
I really like the idea of the conditional formatting as well as other colour options. They are there in Excel and it would be nice to match that - at least for some tables.
I would then hope that quite a bit can be included in the August release? However, the improvements to the similar plotting sub-dialog has taken to now from the start. So perhaps we will be looking at a longer time scale?
In parallel I still wonder whether putting summaries into a list column could also be considered and those sparkline columns might then be a really simple additional feature. We could supply some lists to test that out, if it is part of your list? (I know I am a bit paranoid about this feature. It is just part of my general argument that graphs and tables and now nowhere near as distinct as they used to be.
@rdstern thanks for this! I will spend some time today on looking into the sparkline columns to add them in.
@lilyclements the new gt sub-dialog is now merged. That's in the new Describe > Tables > Presentation Table dialog - that's actually a Presentation of Data-frame dialog. (We are also adding the Table Options
button to the Prepare > View Data
dialog.)
Now the challenge is to re-instate the Table Options
button it in the Describe > Tables > Summaries
dialog. I hope that is going to be easy? I note there is also a gtsummary package. I assume we may later want another dialog that includes that, but we initially want to use our own summary system.
The complication, from @Patowhiz is that the gt code needs to draw on the data frame as well as the gt object. I hope that can just be our summary table data frame, but maybe not?
I hope that the gt object we save can also include the link to the summary data frame? In that case we could also make further use of the Table Options
sub-dialog in the Use Table dialog? That would parallel the Use Graph
dialog.
The Table Options button will also need to be re-instated into the Describe > One Variable > Summarise (Customised)
and Describe > 2/3 Variables > Summarise
dialogs. I assume we handle the general one first? (I say we, but am not sure what I can do here - it is the royal we!)
@anastasia-mbithe and @lilyclements This is coming on well now that the reordering of the variables and the themes have been added.
I hope we can keep going on the improvements. Here are a few simple suggestions, plus some that might take a bit longer.
Here is a 4-way table with Ana's new Excel theme:
I really like that we have the Excel theme even though it is an example of a poor table.
Here is the dialogue for the table above, with Example 1 from the
agriTutorial
package - that you will need to install if you would like to use the same example.Some simple suggestions first: a) Please check the order of the factors in the first receiver. I assumed that they would go in the order Replicate, then Management for the columns, followed by Nitrogen then Variety for the rows - that's to give the layout above, which corresponds to the textbook layout. I don't have a serious problem with your current order as long as there is a simple logic to explain it.
b) There is just a single summary. That's given as mean-yield in the table above. When there is only a single summary could we have an option to not give it at all? Or perhaps no option, but just don't display it? Or it becomes a default totle or footnote?
c) I think the checkbox to Display Outer Margins now displays all margins - which is great. So (if I am right), delete the word
Outer
from the label.d) Similarly simplify Display Summary-Variables as Rows to
Display Summaries as Rows
.Now how could I dictate that all data in the table above are shown to one decimal? I think that may be easy through the gtsummary package.
e) If so, then add
gtsummary
. Then I think it isstyle_number
in that package.f) Investigate adding the gtsummary theme - which seems to have some sub-themes and also permits the tables in different languages!
g) What else becomes easy once gtsummary is used?