baumer-lab / fertile

creating optimal conditions for reproducibility
GNU General Public License v3.0
52 stars 5 forks source link
r reproducibility reproducible-research tidyverse workflow

Travis-CI Build
Status Codecov test
coverage Lifecycle:
stable

fertile: creating optimal conditions for reproducibility

Tools to make achieving R project reproducibility easy!

Why fertile?

Sample Project

miceps: variable containing path to directory containing following project:

Easily Create Reproducibility Reports

proj_badges(miceps)

 

Run Reproducibility Checks

fertile contains 16 checks on different aspects of reproducibility:

 

Run them individually or in customizable groupings, w/ proj_check(), proj_check_some(), or proj_check_badge()

# Individual check
has_well_commented_code(miceps)
#> ● Checking that code is adequately commented
#>    Problem: Suboptimally commented .R or .Rmd files found
#>    Solution: Add more comments to the files below. At least 10% of the lines should be comments.
#>    See for help: https://intelligea.wordpress.com/2013/06/30/inline-and-block-comments-in-r/
#> # A tibble: 1 x 2
#>   file_name                                                fraction_lines_comme…
#>   <chr>                                                                    <dbl>
#> 1 /var/folders/v6/f62qz88s0sd5n3yqw9d8sb300000gn/T/Rtmp3v…                  0.04
# Combined checks
proj_check_badge(miceps, "documentation")
#> ✓ Checking for clear build chain
#> ✓ Checking for README file(s) at root level
#> ● Checking that code is adequately commented
#>    Problem: Suboptimally commented .R or .Rmd files found
#>    Solution: Add more comments to the files below. At least 10% of the lines should be comments.
#>    See for help: https://intelligea.wordpress.com/2013/06/30/inline-and-block-comments-in-r/
#> # A tibble: 1 x 2
#>   file_name                                                fraction_lines_comme…
#>   <chr>                                                                    <dbl>
#> 1 /var/folders/v6/f62qz88s0sd5n3yqw9d8sb300000gn/T/Rtmp3v…                  0.04
#> ── Summary of fertile checks ─────────────────────────────── fertile 1.1.9003 ──
#> ✓ Reproducibility checks passed: 2
#> ● Reproducibility checks to work on: 1
#> ● Checking that code is adequately commented
#>    Problem: Suboptimally commented .R or .Rmd files found
#>    Solution: Add more comments to the files below. At least 10% of the lines should be comments.
#>    See for help: https://intelligea.wordpress.com/2013/06/30/inline-and-block-comments-in-r/
#> # A tibble: 1 x 2
#>   file_name                                                fraction_lines_comme…
#>   <chr>                                                                    <dbl>
#> 1 /var/folders/v6/f62qz88s0sd5n3yqw9d8sb300000gn/T/Rtmp3v…                  0.04

Warnings For Potentially Non-Reproducible Commands

read_csv("/Users/audreybertin/Documents/fertile/project_miceps/mice.csv")
#> Checking for absolute paths...
#> Error: Detected absolute paths. Absolute paths are not reproducible and will likely only work on your computer. If you would like to continue anyway, please execute the following command: readr::read_csv('/Users/audreybertin/Documents/fertile/project_miceps/mice.csv')
setwd(miceps)
#> Error: setwd() is likely to break reproducibility. Use here::here() instead.

Several data-reading functions built in to fertile’s warning system:

 

Customize warning system by:

# Add stats::write.ftable to the warning system

add_shim(func = "write.ftable", package = "stats")

Installation

You can install fertile from GitHub with:

# install.packages("remotes")
remotes::install_github("baumer-lab/fertile")

Citation

citation("fertile")
#> 
#> To cite fertile in publications use:
#> 
#>   Bertin AM, Baumer BS. Creating optimal conditions for reproducible
#>   data analysis in R with 'fertile'. Stat. 2021;10:e332.
#>   https://doi.org/10.1002/sta4.332
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {Creating optimal conditions for reproducible data analysis in R with 'fertile'},
#>     author = {{Bertin} and Audrey M. and {Baumer} and Benjamin S.},
#>     journal = {Stat},
#>     volume = {10},
#>     number = {1},
#>     pages = {e332},
#>     keywords = {uality control, statistical computing, statistical process control, teaching statistics},
#>     year = {2021},
#>     doi = {https://doi.org/10.1002/sta4.332},
#>     url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/sta4.332},
#>     eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/sta4.332},
#>     note = {e332 sta4.332},
#>     abstract = {The advancement of scientific knowledge increasingly depends on ensuring that data-driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation and no clear consensus on standards of what constitutes reproducibility in published research. We present fertile, an R package that focuses on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment. fertile operates in two modes: proactively, to prevent reproducibility mistakes from happening in the first place, and retroactively, analyzing code that is already written for potential problems. Furthermore, fertile is designed to educate users on why their mistakes are problematic and how to fix them.},
#>   }

The fertile release at the time of publication for the above citation can be found here: https://github.com/baumer-lab/fertile/releases/tag/v1.0