Running RMarkdown dashboards from the command line in linux (and console messaging in Rmd)

HARPgroup / cbp6

Chesapeake Bay Program Phase 6 Model Suite

0 stars 0 forks source link

Running RMarkdown dashboards from the command line in linux (and console messaging in Rmd) #145

Open rburghol opened 5 years ago

rburghol commented 5 years ago

Overview

Mass hydro analysis metrics calculated to compare 2 model runs
- Step 1 is to do gage versus non gage for calibration/validation type analysis
Decouple analysis from PDF generation
- Flow metrics and images stored during analysis process
- pdf generated from stored images

Steps

Create analysis
- Store flow metrics in dh
- store images as image properties in dh
Create dashboard script that runs entirely from stored properties
- Stored flow metrics
- Stored images
Store pdf link as linked file property too on model ("pos-condition") property

Previous Work

Wanting to pass the river segment and runid into the dashboard scripts so I could run them from command line in linux, I went exploring. First, to pass in arguments in a regular Rscript, is simple using the () command:

# load command line args
argst <- commandArgs(trailingOnly=T)

Then I check to load arguments if given. There may be a more elegant way, but for this example I just assumed that the user would know the first argument would be riverseg, and the 2nd (if supplied) would be runid:

if (length(argst) >= 1) {
  riv.seg=argst[1]
}
if (length(argst) >= 2) {
  run.id=argst[2]
}

To get Rscript to run an Rmd is not quite the same as running a regular .R file. You have to explicitly call knitr:

Rscript -e "rmarkdown::render('Working_Gage_Vs_Vahydro_Dashboard_2019.Rmd', clean=TRUE)" RU5_6030_0001 121

So, this part works! I did not succeed in running the whole script, but that was expected since there are a host of things to do there. But I did create a functioning example that creates a very simple PDF using command line argument input. See below:

Name: args.R (creates a file args.pdf)
CMD: Rscript -e "rmarkdown::render('args.Rmd', clean=TRUE)" RU5_6030_0001 121

Code:

---
header-includes: \pagenumbering{gobble}
output: pdf_document
pdf_document: default
word_document: default
html_document: default
params:
token: token
riv.seg: riv.seg

title: "Test" header-includes:

\usepackage{titling}
\pretitle{\begin{flushleft}}
\posttitle{\end{flushleft}} output: pdf_document


library(knitr)
#token = params$token
#riv.seg = params$riv.seg

argst <- commandArgs(trailingOnly=T)

print(paste('argst:',argst))

riv.seg <- 'default'
run.id <- 0

if (length(argst) >= 1) {
  riv.seg=argst[1]
}
if (length(argst) >= 2) {
  run.id=argst[2]
}

print(paste("riv.seg = ", riv.seg))
print(paste("run.id = ", run.id))

Try it out and check out the resulting pdf.

hdaniel7 commented 5 years ago

The current way we've been generating these one-off dashboard comparisons has been simply to select the "knit to PDF" button in Rstudio after changing the riv.seg, run.id, and site_number variables within the gage vs. vahydro or vahydro vs. vahydro Rmarkdown files. It's not the most savvy solution, but it's what we've settled on as good enough when making individual comparisons.

I was unaware that you could generate Rmarkdown files using command line scripts such as

Rscript -e "rmarkdown::render('Working_Gage_Vs_Vahydro_Dashboard_2019.Rmd', clean=TRUE)" RU5_6030_0001 121

Instead, the workaround I developed when we were generating dashboards en masse for previous analysis is shown in the R file Automated_Dashboard_Creator_2019.R. This script contains a function, automated_dashboard, which reads in a list of all river segments and then generates a pdf for each of these river segments from "Working_Dashboard_2019.Rmd" (a now-defunct markdown file which generated a cbp model scenario 1 vs. cbp model scenario 2 comparison back when that was the only type of comparison we were doing).

So, in short, my workaround was to render a pdf of the Rmarkdown within a R function which I would run from R on the deq2 server -- this is why those parameters were being passed into the Rmarkdown file via the params variable. This function is broken at the moment (and also useless for the type of analysis we're doing now) -- I could fix and expand upon this function so that it is usable for the type of analysis we are doing now (VA Hydro model run vs VA Hydro model run), but your method of generating the scripts seems to be far more useful for our current analysis -- the only upside of the automated_dashboard function was the ability to generate all the dashboards at once, but that doesn't seem like a capability we need for our current analysis which is more focused on specific river segments.

If you agree, I'll try to troubleshoot some of the issues you've been having with generating these dashboards from the Linux command line so that that option is up and running.

hdaniel7 commented 4 years ago

For the new, modularized dashboard, I used the code Rscript -e "rmarkdown::render('~/cbp6/code/Modularized_Dashboard.Rmd', params = list(riv.seg = 'JU3_7400_7510', dat.source1 = 'vahydro', dat.source2 = 'vahydro', start.date = '1984-01-01', end.date = '2000-12-31', github_link = '~', site = 'http://deq2.bse.vt.edu/d.dh', site.or.server = 'server', run.id1 = '120', run.id2 = '121', gage_number = '02018000', mod.phase1 = 'p6/p6_gb604', mod.scenario1 = 'CFBASE30Y20180615', mod.phase2 = 'p6/p6_gb604', mod.scenario2 = 'CBASE1808L55CY55R45P50R45P50Y'))" to create the file "Modularized_Dashboard.pdf" within my "~/cbp6/code" directory -- the issue I'm running into now is that I am unable to view this file.

First, I tried moving the generated pdf to the /out/dashboards directory so that I could view them from http://deq2.bse.vt.edu/p6/p6_gb604/out. I used the code '''mv Modularized_Dashboard.pdf /opt/model/p6/p6_gb604/out/dashboards/Modularized_Dashboard.pdf'' and got the error '''mv: cannot create regular file '/opt/model/p6/p6_gb604/out/dashboards/Modularized_Dashboard.pdf': Permission denied```.

I also tried messing around with the initial command to generate the script -- for example, I added output_dir and output_file parameters as follows: Rscript -e "rmarkdown::render('~/cbp6/code/Modularized_Dashboard.Rmd', output_dir = '/opt/model/p6/p6_gb604/out/dashboards', output_file = 'TEST.pdf', params = list(riv.seg = 'JU3_7400_7510', dat.source1 = 'vahydro', dat.source2 = 'vahydro', start.date = '1984-01-01', end.date = '2000-12-31', github_link = '~', site = 'http://deq2.bse.vt.edu/d.dh', site.or.server = 'server', run.id1 = '120', run.id2 = '121', gage_number = '02018000', mod.phase1 = 'p6/p6_gb604', mod.scenario1 = 'CFBASE30Y20180615', mod.phase2 = 'p6/p6_gb604', mod.scenario2 = 'CBASE1808L55CY55R45P50R45P50Y'))" -- when running this command, I also got the error In dir.create(dirname(name), recursive = TRUE) : cannot create dir '/media/NAS/omdata/p6/out/dashboards/TEST_files', reason 'Permission denied during the knitting process -- it seems as if I've lost write permission to the /p6_gb604/out/ directory.

To check this, I tried creating a text file in the /out/ directory and my permission was denied yet again -- @rburghol, are these write permissions something you can look into and change? Thanks!

rburghol commented 4 years ago

Let me get with Denton on this. We have made the storage on these to a remote server, but the directory paths still look the same, but clearly this is a perms issue. Thanks for heads up.

hdaniel7 commented 4 years ago

I can confirm that the permissions on the remote server appear to be cleared up -- I've just used the command Rscript -e "rmarkdown::render('~/cbp6/code/Modularized_Dashboard.Rmd', output_dir = '/opt/model/p6/p6_gb604/out/dashboards', output_file = 'TEST.pdf', params = list(riv.seg = 'JU3_7400_7510', dat.source1 = 'vahydro', dat.source2 = 'vahydro', start.date = '1984-01-01', end.date = '2000-12-31', github_link = '~', site = 'http://deq2.bse.vt.edu/d.dh', site.or.server = 'server', run.id1 = '120', run.id2 = '121', gage_number = '02018000', mod.phase1 = 'p6/p6_gb604', mod.scenario1 = 'CFBASE30Y20180615', mod.phase2 = 'p6/p6_gb604', mod.scenario2 = 'CBASE1808L55CY55R45P50R45P50Y'))" to create the .pdf dashboard "TEST.pdf" at this location.

hdaniel7 commented 4 years ago

I cleared up the IQR bug so I think the dashboards are now all good and ready to be generated -- here's an example dashboard I just generated, if you want to glance at one: http://deq2.bse.vt.edu/p6/p6_gb604/out/dashboards/run_120_vs_run_121/JU3_7400_7510.pdf

Here's an example of the code used to create this dashboard on the server:

Rscript -e "rmarkdown::render('~/cbp6/code/Modularized_Dashboard.Rmd', output_dir = '/opt/model/p6/p6_gb604/out/dashboards/run_120_vs_run_121', output_file = 'JU3_7400_7510.pdf', params = list(riv.seg = 'JU3_7400_7510', dat.source1 = 'vahydro', dat.source2 = 'vahydro', start.date = '1984-01-01', end.date = '2000-12-31', github_link = '~', site = 'http://deq2.bse.vt.edu/d.dh', site.or.server = 'server', run.id1 = '120', run.id2 = '121', gage_number = '00000000', mod.phase1 = 'p6/p6_gb604', mod.scenario1 = 'CFBASE30Y20180615', mod.phase2 = 'p6/p6_gb604', mod.scenario2 = 'CBASE1808L55CY55R45P50R45P50Y', cn1 = 'VA Hydro: Base', cn2 = 'VA Hydro: CC: Precip 50, Temp 50'))"

hdaniel7 commented 4 years ago

For future reference, here's a recording of today's meeting talking about generating and analyzing the dashboards: https://drive.google.com/open?id=1jjAJlpkxBGB7f5QwOepwos3C756HlUjm

rburghol commented 4 years ago

PS2_5560_5100_linville_creek

riv.seg <- "PS2_5560_5100_linville_creek"
run.id1 <- 301
start.date <- '2000-10-01'
end.date <- '2002-12-31'
data1 <- vahydro_import_data_cfs(riv.seg, run.id1, token, site, start.date, end.date)
data2 <- gage_import_data_cfs(gage_number, start.date, end.date)
gage_number  <- '01632082'

sudo -u www-data Rscript -e "rmarkdown::render('./cbp6/code/Modularized_Dashboard.Rmd', output_dir = '/var/www/html/tmp', output_file = 'PS2_5560_5100_linville_creek.pdf', params = list(riv.seg = 'PS2_5560_5100_linville_creek', dat.source1 = 'vahydro', dat.source2 = 'gage', start.date = '2000-10-01', end.date = '2002-12-31', github_link = '~', site = 'http://deq2.bse.vt.edu/d.dh', site.or.server = 'server', run.id1 = '601', run.id2 = '601', gage_number = '01632082', mod.phase1 = 'p6/p6_gb604', mod.scenario1 = 'CFBASE30Y20180615', mod.phase2 = 'p6/p6_gb604', mod.scenario2 = 'CBASE1808L55CY55R45P50R45P50Y', cn1 = 'VA Hydro: Base', cn2 = 'VA Hydro: CC: Precip 50, Temp 50'))"

Using Modularized_Dashboard_VAHydro.Rmd


riv.seg <- "PS2_5560_5100_linville_creek"
run.id1 <- 301
start.date <- '2000-10-01'
end.date <- '2002-12-31'
dat.source1 = 'gage'
dat.source2 = 'vahydro'
site = 'http://deq2.bse.vt.edu/d.dh'
site.or.server = 'server'
run.id1 = '601'
run.id2 = '601'
gage_number = '01632082',
mod.phase1 = 'p6/p6_gb604'
mod.scenario1 = 'CFBASE30Y20180615'
mod.phase2 = 'p6/p6_gb604'
mod.scenario2 = 'CBASE1808L55CY55R45P50R45P50Y'
gage.timespan.trimmed = FALSE
cn1 = 'VA Hydro: Base'
cn2 = 'VA Hydro: CC: Precip 50, Temp 50'
export_path_custom = FALSE

data1 <- vahydro_import_data_cfs(riv.seg, run.id1, token, site, start.date, end.date)
data2 <- gage_import_data_cfs(gage_number, start.date, end.date)

sudo -u www-data Rscript -e "rmarkdown::render('./cbp6/code/Modularized_Dashboard_VAHydro.Rmd', output_dir = '/var/www/html/tmp', output_file = 'PS2_5560_5100_linville_creek.pdf', params = list(riv.seg = 'PS2_5560_5100_linville_creek', dat.source1 = 'gage', dat.source2 = 'vahydro', start.date = '2000-10-01', end.date = '2002-12-31', site = 'http://deq2.bse.vt.edu/d.dh', site.or.server = 'server', run.id1 = '601', run.id2 = '601', gage_number = '01632082', mod.phase1 = 'p6/p6_gb604', mod.scenario1 = 'CFBASE30Y20180615', mod.phase2 = 'p6/p6_gb604', mod.scenario2 = 'CBASE1808L55CY55R45P50R45P50Y', gage.timespan.trimmed = FALSE, cn1 = USGS 016382', cn2 = 'VA Hydro Current', export_path_custom = '/tmp/'))"

rburghol commented 4 years ago

Using new Modularized_Dashboard_Any.Rmd

sudo -u www-data Rscript -e "rmarkdown::render('./cbp6/code/Modularized_Dashboard_Any.Rmd', output_dir = '/var/www/html/tmp', output_file = 'PS2_5560_5100_linville_creek.pdf', params = list(riv.seg = 'PS2_5560_5100_linville_creek', dat.source1 = 'gage', dat.source2 = 'vahydro', start.date = '2000-10-01', end.date = '2002-12-31', site = 'http://deq2.bse.vt.edu/d.dh', site.or.server = 'server', run.id1 = '601', run.id2 = '601', gage_number = '01632082', mod.phase1 = 'p6/p6_gb604', mod.scenario1 = 'CFBASE30Y20180615', mod.phase2 = 'p6/p6_gb604', mod.scenario2 = 'CBASE1808L55CY55R45P50R45P50Y', gage.timespan.trimmed = FALSE, cn1 = 'USGS 01632082', cn2 = 'VA Hydro Current', export_path_custom = '/tmp/'))"