easystats / report

:scroll: :tada: Automated reporting of objects in R
https://easystats.github.io/report/
Other
694 stars 70 forks source link

text_short() etc. and current API thoughts? #153

Open bwiernik opened 3 years ago

bwiernik commented 3 years ago

It's not exactly clear to me what the current status of text_short() etc. are. They still appear in the pkgdown site, but are commented out in various places in the source. Has that API been abandoned?

One API thought I had looking at the various functions is that format() might be a good generic to use here because it is so extensible. Table/text/numbers only, as well as choosing specific parameters or quantities could controlled with arguments. Not saying to discard other existing approaches, but might be useful in lieu of an ever expanding list of specific functions?

DominiqueMakowski commented 3 years ago

text_short was rather deprecated in favour of the summary() generic report() %>% summary(). Where you would see format fitting? In lieu of summary?

One nice API in theory would be to have report() and then pipe it to format(interpet_p = FALSE, display_intercept = TRUE, show_this = TRUE) that would format the report according to specifications, but that seems impossible as there's hardly any modularity possible at the report level, currently the workaround for the summary() is to have both versions (long and short) prepared upfront by report() and then summary() just extracts the short version but doesn't do anything in itself.

Basically, one of the issue with report (and why it had several entire rewrites, see the discussion around the last one here) is the balance between modularity and specificity. In some cases, it would be easier to just have a report() function for each model that does everything specifically for it, but it's also hard to test and a leads to lot of semi-redundant and non-optimized code. But full-modularity with pieces that can be manipulated afterwards seems also impossible.

So currently we split the reports into several submodules (report_parameters, report_intercept, report_performance, etc.) [that we also export] which seem to make sense for a lot of models and share a lot of commonalities, but once the main report() assembles them that's it it's hard to preserve the piece-wise information. In fact, going back to the issue, each of this piece also makes the long and "summarized" version, which is arguably not the most efficient way of doing it (even though the pieces are relatively lightweight since there are just some text strings or tables) but I didn't find any better way of doing it.

One possibility would be to drop the summarized version altogether, but I realized (early on) when using proto-versions of reports that in my own work for instance if I have several similar models, having all the full info for all of them made the files super verbose and not digest with a lot of redundant info, so it was nice to have like one time the full model and then just the critical information (although I agree that what is "critical" is highly debatable)

~in any case~ I forgot what I wanted to say

bwiernik commented 3 years ago

The main things I was considering were (1) interpret_p or similar arguments to exclude p values and/or their interpretation and (2) a format to report only formatted numeric values without accompanying text (ala #86 and #108).

bwiernik commented 3 years ago

text_short was rather deprecated in favour of the summary() generic report() %>% summary().

The pkgdown site still has these functions documented: https://easystats.github.io/report/reference/text_long.html

IndrajeetPatil commented 3 years ago

https://easystats.github.io/report/reference/text_long.html

Oh wow, the report 1.0 website pieces are somehow still alive on the web.

Anyone knows how to build a cleaned website using GitHub-actions? Locally one can always use clean_site function, but I am not sure how to do this in actions-based workflow. We just need to do this once I guess.

bwiernik commented 3 years ago

I'm wondering what the purpose of storing static text_full and text values is, versus keeping the components as a list until print/format time.

So, for example:

format_report_coefficient <- function(rr, digits = 3, down_with_nhst = FALSE) {
    if (down_with_nhst) {
        rr$interpret_significance <- NA
        rr$p_value <- NA
    }
    with(
        rr,
        glue::glue(
            "The effect of {term} was",
            if (!is.na(interpret_significance)) " {interpret_significance}" else "",
            if (all(!is.na(interpret_significance), !is.na(interpret_es))) " and" else "",
            if (!is.na(interpret_es)) " {interpret_es}" else "",
            " (β = {format(es, digits = digits)}",
            if (!is.na(ci)) ", [{ci}% CI {format(ci_low, digits = digits)}, {format(ci_high, digits = digits)}]" else "",
            if (!is.na(p_value)) ", p = {format(p_value, digits = digits)}" else "",
            ")",
            "."
        )
    )
}

rr <- list(term = "predictor", interpret_significance = "statistically significant", interpret_es = "large", es = .90, ci = 99, ci_low = .85, ci_high = .95, p_value = .001)

format_report_coefficient(rr)
# The effect of predictor was statistically significant and large (ß = 0.9, [99% CI 0.85, 0.95], p = 0.001).

format_report_coefficient(rr, down_with_nhst = TRUE)
# The effect of predictor was large (ß = 0.9, [99% CI 0.85, 0.95]).

Here, the idea is that report() generates a list of components and then this function is what would glue the pieces together according to the specified parameters. This enables modularity for things like omitting significance or p values, reporting just formatted numeric results without textual interpretations, or even adjustment of things like number of digits, CI width, etc. at print time.

(Using glue() here is mostly to make things easier to write, but it also is what permits easy use of digit formatting, etc.)

bwiernik commented 3 years ago

Other formatting bits could include things like localization (e.g., US English "we fit a model" vs UK English "we fitted a model", perhaps eventually non-English translation).

DominiqueMakowski commented 3 years ago

That seems like a good idea, we could start experimenting/implementing such increased modularity little-by-little, starting in places where it's easy and "stable"?

though we might want to keep not having "glue" as a dependency (or we add it, after all, report does some heavy gluing throughout... @strengejacke ?).

Also, I don't think we'll be able to have ß symbols, since CRAN only allows ASCII symbols in the package (hence that discussion https://github.com/easystats/report/issues/97 and my proposal to spell out the symbol as beta as a replacement).

strengejacke commented 3 years ago

though we might want to keep not having "glue" as a dependency (or we add it, after all, report does some heavy gluing throughout...

Although glue has no further dependencies, I don't see the benefit, i.e. if using glue outweigh potential issues. I always have the example of "withr" and "rstanarm" in mind, which totally broke the package. Unless we really need specific functions, I'd say we stick to base R functions.

strengejacke commented 3 years ago

have ß symbols

Yes, I think so, too. I'm not sure if expression() or bquote() help, or if these are only working for graphics?

bwiernik commented 3 years ago

The way to use the beta symbol is to Unicode escape it. "\u03B2"

DominiqueMakowski commented 3 years ago

I'm worried that it will introduce a lot of \u03B2 = 42 when formatting is not done. Isn't easier to keep, at its core, beta = 42 and then eventually replace all of the beta = by `\u03B2 = during postprocessing/formatting/displaying?

bwiernik commented 3 years ago

If the printing environment can’t support unicode (such as R GUI on Windows with a US code page), it prints fallback characters, which for Greek letters are English characters (b, etc). There are some tests we could run to check the environment and unicode support, but they probably aren’t necessary.

If report is going to be used in, for example, latex or markdown report generation, it really needs to be able to include correct character formatting.

DominiqueMakowski commented 3 years ago

cool! Indeed we should test it a bit first but otherwise it seems like a good idea