Changing content of the output of `episoap` templates

CarmenTamayo commented 5 months ago

On the existing version of the transmissibility pipeline, the contents of the report template include instructions for the user when choosing parameters, explanations of epidemiological concepts, and descriptions of the outputs for the reader of the generated reports.

After much discussion, and given the existence of other resources, such as the how-to guides, tutorials, and case studies, we've reached the agreement to keep the content of episoap templates to only describe the output for the reader, and include the instructions for the package users in separate vignettes in the package.

These changes are currently being added to the branch "tx_pipeline_update", which will be merged into main over the coming weeks.

Bisaloo commented 5 months ago

Adding from our past conversations for completeness:

If we believe this information is important and should still be somewhere, it should be converted as code comments rather than text.

CarmenTamayo commented 5 months ago

Thanks Hugo, to build up on the point, the information currently contained on the templates, which is not kept in the report's output, will be included either as part of:

a) documentation of the pipelines in vignettes b) as code comments inside the r markdown chunks of code

CarmenTamayo commented 5 months ago

Regarding the description of the input dataset that is used in the template, currently there is a section that describes the example dataset (daily numbers of COVID-19 hospitalisations in England as of 24 October 2020)

However, when users work with their own data, this description will no longer be applicable to them, or to the readers of the report (e.g., other field epis, decision makers, members of the public, etc). Solutions could be:

a) To include the description of the example dataset as part of the vignette that describes the transmissibility pipeline instead b) To provide a description of the example dataset by default, which is not included if the params$data_file is changed by the user to a different dataset c) To include a template for users to describe their own data, e.g., "The data used in this report represents [description from user]. This dataset was obtained from [description from user]. The following columns are included in this dataset [name: description], [name: description]..."

These solutions aren't mutually exclusive

Bisaloo commented 5 months ago

a) seems best. Possibly with c) if you like it.

The only potential downside with c) is that the report can not (no longer?) be rendered directly, without any modification of the code by users.

CarmenTamayo commented 5 months ago

I see, although the report could be rendered with the covid data, the only thing is that it wouldn't be described...

The description could look like this:

"The data used to generate this report contains [enter description]. These data are stratified by [enter region level]. The data is available from [enter location].

The data file is named "file_name" and is located in the data/ folder.

Once imported into R, the dataset called dat_raw includes: <- here we only describe the columns that will be used in the analysis, and everything else would be extra information that could be added by the user if they want to

date: the date of [description] <- could be admission or onset
region: the region of [description]
n: number of new, confirmed params$disease_name cases"

(Currently this is the description of the data included in the report: To illustrate the different analyses, we use real data reporting daily numbers of COVID-19 hospitalisations in England as of 24 October 2020, broken down to the hospital and National Health Service (NHS) region level. The data is available online from the NHS England's website. The dataset analysed here is a simplified version, providing incidence of hospital admissions by NHS trust.

Once imported into R, the dataset called dat_raw includes:

date: the date of admission
region: the NHS region
org_name: the full name of the NHS trust
org_code: a short code for the NHS trust
n: number of new, confirmed COVID-19 cases admitted, including inpatients who tested positive on that day, and new admissions with a positive test)

Bisaloo commented 5 months ago

I don't have strong preferences. Happy to follow your informed opinion on this.

CarmenTamayo commented 5 months ago

I don't have strong preferences. Happy to follow your informed opinion on this.

Okay, we can try this and see if any changes need to be made when receiving user feedback

CarmenTamayo commented 2 months ago

Building on this, on the PR #144 I was following approach c, complemented with a, meaning that the text in the template would read like this:

"... we use data reporting daily numbers of [enter disease name] during [enter timeframe], and is stratified by [enter grouping variable name]"

This would mean that users always have to manually modify the text, as @Bisaloo mentioned. I realised that some or all of this could be avoided if we included inline code that renders either the params or the objects, such as group_var, but this only works if these variables are defined before the text itself (which currently doesn't happen on the template)

Example of what I mean:

"... we use data reporting daily numbers of r params$disease_name during [enter timeframe], and is stratified by r group_var"

This way users would (mostly) only have to provide the parameters/specify variables once- still the approach doesn't apply to every case, e.g., to the [enter timeframe] bit, we could either remove this from the description, leave it for users to modify manually as it is now on the PR, or somehow extract the year from the "date" column in the dataset and assign it to an object too in the same manner as group_var- for instance, through:

min_date <- min(dat_raw$date) max_date <- max(dat_raw$date)

"... we use data reporting daily numbers during the time period from r min_date to r max_date "

Hugo, it'd be great to hear your input for this and opinion on the feasibility of this approach- thank you!

Bisaloo commented 2 months ago

This sounds good but also a little bit of a rabbit hole. For example, it may not always be daily numbers, which means we would have to adjust this as well, etc.

Let's keep it simple for now and we can keep refining later.

CarmenTamayo commented 2 months ago

I agree

This sounds good but also a little bit of a rabbit hole. For example, it may not always be daily numbers, which means we would have to adjust this as well, etc.

Let's keep it simple for now and we can keep refining later.

I agree, I think a big part of the problem would be solved if the variables were introduced just once rather than twice, at the moment the template first gives an overview of the variables included in dat_raw, and then there's a second part where relevant variables are identified an described again. If the first part is only dedicated to importing the data and maybe just showing the heading of the dataset for readers to have an idea of how it looks like, and the description of the data is done together with "Identifying key variables", users will only be asked to modify the template in one section (which they would have had to do most likely to indicate the names of the columns that correspond with group_var, count_var, etc), and then these objects together with the params can be used when rendering the document to describe the data to the reader

Bisaloo commented 2 months ago

I agree, I think a big part of the problem would be solved if the variables were introduced just once rather than twice, at the moment the template first gives an overview of the variables included in dat_raw, and then there's a second part where relevant variables are identified an described again. If the first part is only dedicated to importing the data and maybe just showing the heading of the dataset for readers to have an idea of how it looks like, and the description of the data is done together with "Identifying key variables", users will only be asked to modify the template in one section (which they would have had to do most likely to indicate the names of the columns that correspond with group_var, count_var, etc), and then these objects together with the params can be used when rendering the document to describe the data to the reader

Yes, this makes sense. Let's make this change.

CarmenTamayo commented 2 months ago

This has been addressed in #144

epiverse-trace / episoap

Changing content of the output of `episoap` templates #120