co-analysis / a11ytables

R package: generate best-practice stats spreadsheets for publication
https://co-analysis.github.io/a11ytables/
Other
42 stars 3 forks source link

Implement thousands comma-separators by default #51

Open matt-dray opened 2 years ago

matt-dray commented 2 years ago

There has been a request to:

...implement comma separated thousands format in the workbook, using openxlsx::createStyle(numFmt = “COMMA”) as per the guidance.

(Also, this is a reminder to look over the guidance in detail to extract all these kinds of cell-level features.)

matt-dray commented 2 years ago

This is kind of complex. The styling will occur when the column is full of numerics, but if it's character due to to a suppressed value, then the commas won't be shown.

Screenshot 2022-03-21 at 14 31 36

Probably means that the thousands-separator styling will have to be dynamically applied by identifying the rows that have numbers in (e.g. coerce them to numeric and see if a non-NA result emerges).

matt-dray commented 8 months ago

Note the options for {openxlsx} could also be set to allow for the comma separator, as mentioned in #69, but this still won't work with columns that contain a placeholder or other string content.

matt-dray commented 7 months ago

Consider applying scales::number() if it needs to be done manually in some way.

matt-dray commented 3 months ago

Current thoughts: create a function to dynamically construct a numFmt string based on the values provided in each column that's detected as numeric? So a column with numbers like 1234.56 would be interpreted as having a numFmt of "#,##0.00" and ultimately outputs 1,234.56 in the spreadsheet. (Note that we can't just use the "COMMA" preset because it appears to remove decimal places.) The approach assumes the user has provided the number of decimal places that they want in the final output, which contradicts #69 (retain full set of decimal-place values, but show only a certain number in the cell itself). So maybe the user provides an option to {a11ytables} that is the blanket 'number of decimal places for doubles in this whole workbook' (simpler, but not useful where differenc olumns or sheets need different numbers of decimal places), or otherwise provide a list of instructions for which columns need what number of decimals (erodes the 'automatic' nature of the package and trickier to implement a user interface for this; would have to do something like [{rapid.spreadsheets}]()).

To actually apply this approach, we'd need to replace the non-numeric values in a provided column with NA, e.g. replace a placeholder like [c] with NA_real_, then apply the numFmt, then put the placeholders back in. {shrthnd} is the obvious candidate to help with this. But the output spreadsheet will still end up as a text column, presumably. Experiments with {openxlsx2} suggest you can apply number formats over individual cells rather than whole columns, which means you can actually interact with values as though they're numbers (which you can't do when the column is converted to text because of text placeholders like [c]).