IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Small improvement needed to the current Describe > Tables dialog #9010

Open rdstern opened 3 weeks ago

rdstern commented 3 weeks ago

The current dialog is already usable by @Patowhiz. Like the 2-variable summarise, it produces a gt object. Here is an example of the current dialog: This (and the nex) are long explanatory comments! They confirm that all Patrick need do is to continue with his current work with a data frame, or gt( ) object, as his starting point. That includes the sub-dialog that is the parallel of the graph sub-dialog system. There is a small amount of work for others (probaly Roger, Antoine, Vitalis and Lily) to be able to produce differently structured gt objects for him.

He has been producing his own dialog. As discussed this will become the View Data dialog, when starting with a data frame (those that are in the right "shape" to be a gt object). It can also become the new Use Tabledialog, because that will then be a dialog that modifies a gt table. So the existing, (slightly modified) Tables dialog can quickly produce what we might call a default "exploratory table". Then you can choose whether to change that, via the Format Table button, into a more presentation table. Or look at it first, and then - if it looks promising - go to the Use Table dialog to make it a presentation table. Our "system" seems to be holding up, and I like the fact we have options for users.

image

Here is the code produced:

image

In the code, we note there are the three stages: a) Produce a summary - a data frame b) Pivot wider to produce an initial gt object. c) Then we are at @Patowhiz starting point - as agreed - so the Format Table is the entry point to his sub-dialog - same as the view table dialog, and we magically have a really nice gt formatted table. d) Note if this table is saved, then it is saved as a gt object. And that fits perfectly with the Use Table dialog that Ana was workin g on earlier.

I suggest 2 aspects need to be edited:

a) There is both store table and save table in the dialog.
Store Output saves the summary data frame. So, that's a new data frame as we get with the Prepare > Data Reshape > Column Summaries. Maybe we should say Store Summary? And the Save Table saves a gt object, just like it saves a graph object.

I have checked back to version 0.7.6 (when we used mmtable2 instead of gt). I confirm that the storing the data frame does just that. And the Saving was an mmtable2 table, and is now a gt object.

b) In the dialog we currently seem to only allow one factor (or the summaries) to go across, i.e. the be the pivot wider. The gt object allows more - that's what the spanners are for I think - so we need to redo that part of the dialog. Now the dialog already has good control of both the order of that factors, and also the order of the summaries. I suggest: 1) We don't need the Treat Summaries as a Further Factor checkbox. Instead we have Position of Summaries Factor forllowed by an up-down control. In the above example dialog it is disabled if there is only one Summary. So it is enabled here because there are 3 summaries. And the default of the up-down is 3, because there are 3 "factors" in the table, namely village, variety and summaries.

2) And we move it into the Display group

3) Then, instead of the 4 radio buttons in the Display box we have a single Number of Row Factors:label, with an up-down from 0 to 3 and default of 2 in the example above (namely the number of "ordinary" factors in the table.)

That's then the same layout (all factors going down, and the variables going across), that you get with the column summaries!

@lilyclements what do you think?

If this works, then the one additional bit of work - very small I hope - is to generalise the pivot-wider bit in the code for the common situations where there are multiple column factors?

(PS: Once working in the ordinary Column Summaries then we should check about the drop unused levels checkbox here as well.)

I assume this should be quick and can then fit neatly with all the work @Patowhiz has been doing in the sub-dialog?

Soon we should design a new dialog that starts from the summaries, and also one that uses gtsummary instead of R-Instat summaries. But let's get these working first.

rdstern commented 3 weeks ago

JUst to confirm the Version 0.7.6:

Here is the dialog:

image

Note the up-down for the column factors, as I am suggesting now, for the revised dialog.

This produces 2 outputs, namely:

image

And:

image

as the table - with those settings.

I realise my specification above is not general enough, because we might have multiple statistics for multiple variables: This was handled quite well in 0.7.6. That could be useful, because we therefore already have the R code for the second stage, i.e. a pivot wider, at least into an mmtable2, which I assume could equally be a gt table. Then we hand over to Patrick's part!

Here are multiple statistics (4) for multiple variables (3) as follows - with summary statistics as a variable:

image

This gives the following code:

# Dialog: Summary\Frequency Tables

summary_table <- data_book$summary_table(data_name="survey", columns_to_summarise=c("yield","fert","size"), factors=c("village","variety"), 
store_table=TRUE, treat_columns_as_factor=TRUE, j=1, summaries=c("summary_count_non_missing", "summary_min", "summary_mean", "summary_max"))

summary_table5 <- (mmtable2::mmtable(data=summary_table, cells=value) + mmtable2::header_top_left(variable=variable) + 
mmtable2::header_top_left(variable=summary) + mmtable2::header_top_left(variable=village) + mmtable2::header_left_top(variable=variety))

data_book$add_table(table_name="summary_table5", table=summary_table5, data_name="survey")
data_book$get_tables(data_name="survey", table_name="summary_table5")
rm(list=c("summary_table5", "summary_table"))

The first important line runs data_book$summary_table and this is the parallel of the first stage in the gt example scripts. It produces a summary data frame, because it includes store_table = TRUE. Here is that table:

image

It has 120 rows, namely 10 for the (non-zero) combinations of variety by village, times 12 for the combinations of 3 numeric variates by 4 summaries.

I am not sure this way of storing is quite general enough? There are 2 things different. The first is that we are using gt. The second, which is new is that we have not thought properly about margins.

Then mmtable2 gives the table as follows:

image

Now to the new system and the closest I can get to the dialog above:

image

This gives the following code:

# Dialog: Frequency/Summary Tables

summary_table <- data_book$summary_table(data_name="survey", columns_to_summarise=c("yield","fert","size"), 
factors=c("village","variety"), store_table=TRUE, treat_columns_as_factor=TRUE, j=1, 
summaries=c("summary_count_non_missing", "summary_min", "summary_mean", "summary_max"))

summary_table4 <- (summary_table %>% pivot_wider(names_from=village, values_from=value) %>% gt::gt())

data_book$add_object(data_name="survey", object_name="summary_table4", object_type_label="table", object_format="html", object=summary_table4)
data_book$get_object_data(data_name="survey", object_name="summary_table4", as_file=TRUE)
rm(list=c("summary_table4", "summary_table"))

And the table as follows:

image

Finally, at this stage I examined both the newly discovered (to me anyway) addmargin function from the statistics package and also what we do currently on margins. See the dialogs above. When I run the same example as above, with the margins option set to both, then the change is to the stored file. It now has 288 rows, and has added the extra levels to the factors - which is great. (And just the same as would be the case with the addmargin function if we had used it.

image

It is therefore clear that the data_book$summary_table function in R-Instat is already formidable. In the example above it provides the "outer margins" for the 4 factor-table that is shown above (There is no distinction here between what we might call ordinary factors - (village and variety) and perhaps special factors, (summary and variable). It builds on David's experience in tables from the old Instat.

So, I propose we stick with this for now. We currently produce outer margins and can hand them to gt( ) so Patrick can ignore the issue completely for now. Later - once we have a fully working system - it could be good to be able to also offer to include addmargin into our system, as this produces inner margins. For now let's ignore margins and get on. @N-thony I suggest there is a small task for @Vitalis95 (or for you) with @lilyclements, and this can proceed in parallel with @Patowhiz finishing the task of formatting a gt table.

I specify that small task below. I also need to check the frequency tables option first. The above is for the (more complicated) summary tables.

N-thony commented 3 weeks ago

@Vitalis95 can you attack the small task here?

Vitalis95 commented 3 weeks ago

I suggest 2 aspects need to be edited:

a) There is both store table and save table in the dialog. Store Output saves the summary data frame. So, that's a new data frame as we get with the Prepare > Data Reshape > Column Summaries. Maybe we should say Store Summary? And the Save Table saves a gt object, just like it saves a graph object.

I have checked back to version 0.7.6 (when we used mmtable2 instead of gt). I confirm that the storing the data frame does just that. And the Saving was an mmtable2 table, and is now a gt object.

b) In the dialog we currently seem to only allow one factor (or the summaries) to go across, i.e. the be the pivot wider. The gt object allows more - that's what the spanners are for I think - so we need to redo that part of the dialog. Now the dialog already has good control of both the order of that factors, and also the order of the summaries. I suggest:

  1. We don't need the Treat Summaries as a Further Factor checkbox. Instead we have Position of Summaries Factor forllowed by an up-down control. In the above example dialog it is disabled if there is only one Summary. So it is enabled here because there are 3 summaries. And the default of the up-down is 3, because there are 3 "factors" in the table, namely village, variety and summaries.
  2. And we move it into the Display group
  3. Then, instead of the 4 radio buttons in the Display box we have a single Number of Row Factors:label, with an up-down from 0 to 3 and default of 2 in the example above (namely the number of "ordinary" factors in the table.)

That's then the same layout (all factors going down, and the variables going across), that you get with the column summaries!

@lilyclements , what do you think of this?

rdstern commented 3 weeks ago

@Vitalis95 and @lilyclements look at the stored data - when there are 2 ordinary factors and there are also multiple variables and multiple summaries.

image

There are now 4 columns (as factors), namely 2 "ordinary" and variables (considered as another factor) and also summaries - considered as another factor. So, if there were 5 ordinary factors, then there would be six or seven factors in total.

So, with this situation I would like (in the Display section) a Row Factors (equivalent of Column Factors control in Version 0.7.6. This has an up-down (as there) with 0 to 4 for the example here. And with 3, so (max - 1) as the default. So the default is one column factor in the example above and 3 row factors. There is a label Variables with an updown from 1 to the Row Factors Maximum. There is a checkbox - roughly from version 7.6 with label "Summaries as a Further Factor". This also has an up-down I think from 1 to the same as the Row Factors Maximum. If that checkbox is unticked then the up-down is hidden. (And the maximum in the example above is 3 rather than 4!) Finally, the Variables control is disabled, if there is just one variable. Similarly for Summaries if there is just one summary.

@Vitalis95 and @lilyclements maybe check what will need to change in the command to send the data to a gt table. I hope that will be relatively easy. It can also wait - if need be - till @Patowhiz is further on the sub-dialog.

lilyclements commented 2 weeks ago

@rdstern I just want to check I understand. So you are suggesting:

  1. Under "Display" we have "Row Factors:" with a NUD. This NUD takes minimum 0 This NUD takes maximum as the "Number of Factors + 2" (but what if we don't "Treat Summaries as Further Factor"?) This NUD decides how many variables we rearrange to be wider, right?

  2. We have a "Variables:" label with a NUD from 1:Row Factors Maximum. Is the "Row Factors" maximum the value chosen in the "Row Factors" NUD (from 1.) What does this NUD do?

  3. I don't understand how "Treat Summaries as Further Factor" isn't a checkbox. I thought this was a boolean of TRUE/FALSE?

rdstern commented 2 weeks ago

1a. Under "Display" we have "Row Factors:" with a NUD. This NUD takes minimum 0 This NUD takes maximum as the "Number of Factors + 2" (but only if we check "Treat Summaries as a Further Factor" and that's only possible if there is more than one summary). If not, then the maximum for the NUD is "Number of factors + 1" 1b) However, if there is only one variable and one summary, then it is the Number of factors + 0! This NUD decides how many variables we arrange to be longer, so the others are rearranged to be wider.

2 We have a "Variables:" label with a NUD from 1:Row Factors Maximum. Is the "Row Factors" maximum the value chosen in the "Row Factors" NUD (from 1.) What does this NUD do?

This is needed if there is either more than one summary, or more than 1 variable. It the gives the position of the Variable Factor in the table. For example with the example above and 2 "real" factors, if the position of the variable is 1 or 2 then the variable values are rows, if 3 or 4 , then columns.

  1. I don't understand how "Treat Summaries as Further Factor" isn't a checkbox. I thought this was a boolean of TRUE/FALSE? It should have a checkbox. A difference from now is that I suggest it goes into the display section of the dialog and is initially checked. If checked we also need to give its position, (rows or columns) so that's its NUD.
Vitalis95 commented 21 hours ago

@lilyclements , when you have time, please take a look at this. Thanks

lilyclements commented 19 hours ago

@rdstern thanks for this. This all makes sense to me. @Vitalis95 do you have any questions before proceeding?

Vitalis95 commented 19 hours ago

@lilyclements , should these changes be made to the summary_table function?