In-browser execution of analyses for studies with available data

eplebel commented 9 years ago

Given Curate Science's mission is to facilitate and incentivize the independent evaluation and verification of published scientific findings, a key feature to be implemented is allowing users to verify the reproducability of other researcher's study results within their browser. We want to make it as easy as possible (and then reward) researchers for checking the reproducibility of each other's results (i.e., reproduce the results reported in an article by executing the same analyses on the publicly available data). Hence, for studies that have available data, users will be able to analyze data within the web application and then leave a comment stating they endorse that the results are in face reproducible (which will activate logo that visually indicates this: data-reproducible-icon ).

Initial specs for this feature can be found here: https://dl.dropboxusercontent.com/u/227724/In-browser%20analyses%20SPECS.docx

For now, the two main features will allow logged in users to:

Execute pre-existing .R files that have been linked for a study
Load .sav (SPSS) and .csv files in R window as R data objects (so user can start visualizing/understanding data & be ready to verify reproducibility of a study’s results)

Here's how the UI will look like, using an RStudio Server implementation: in-browser-analyses website

This feature is very important given that it adds a lot of utility for users interested in verifying others' results, which is a growing interest (it's way easier to re-analyze someone else's data by being able to do it directly in-browser rather than having to download the data, then manually loading the data, etc.). This is consistent with the meta-science literature, which has argued that open science initiatives need to have sufficiently high expected utility that overcomes initial buy-in costs to engage with the platform (Buttliere, 2014, http://journal.frontiersin.org/Journal/10.3389/fncom.2014.00082/full)

jonathon-love commented 9 years ago

hi @rubenarslan, just to fill you in a bit, this was my earlier suggestion:

although it would certainly be nice to perform the analysis in the browser, it would seem to me to be a lot less work, and easier for reviewers, to let reviewers download the data and use whatever tool they feel comfortable in. reviewers could then upload the R code, the SPSS syntax, or the JASP analysis file - then people could review the reviewer.

having said that, i certainly am sympathetic to @eplebel's desire to reduce the barriers to participation

eplebel commented 9 years ago

Reducing the barriers to participation is precisely the goal here. Users indeed currently do have the ability to download the data for local analyses, and we imagine for many users this will be good enough and/or the preferred method for them. However, for some users (who may not have as much time and/or incentive to independently verify someone else's data), it is very valuable to allow such users the ability to quickly and easily do these re-analyses directly in the browser without having to download the data and load it in a statistical software etc.

jonathon-love commented 9 years ago

So although you can't use JASP within the web browser (a deliberate decision was made for it to not be a web application), one option would be to use the JASP statistics engine. JASP is made up of two parts, the user interface and the statistics engine. The UI part sends requests to the engine, asking it to perform the analyses. The engine runs the analysis, and passes the results back to the UI.

The analyses that the engine provides are nice, because:

a) they are neatly contained; (as opposed to most of the time in R, where doing a complete ANOVA [robustness tests, contrasts, test of assumptions, post hoc, plots, etc.] requires working with ~7 packages)

b) they are easy to use; (i don't want to do an RM ANOVA in R)

c) the results generated are JSON which map neatly to APA formatted HTML tables (the results panel to the right of JASP is all HTML and JS) that you can display in a web page

So the JASP statistics engine would be a component worth considering. It wouldn't be ready for your application as is, it would need some work to make it work outside of JASP. I would need to think about it.

One concern would be security; some of the analyses in the JASP engine are written in R, but I think that R is almost impossible to sandbox (compared to other languages) (which is why I would be very shy about any solution that allows for the execution of arbitrary R code - perhaps opencpu.org have solved this, but I have my doubts)

R was primarily designed with the local user in mind, security restrictions and unpredictable behavior have not been considered a major concern in the design of the software.

Sandboxing in this context is a somewhat informal term for creating an execution environment which limits capabilities of harmful and undesired behavior. As it turns out, R itself is not very suitable for implementing such access control policies

from here: http://www.jstatsoft.org/v55/i07/paper

Of course, this leaves the user interface - which would need to be written from scratch in HTML, but you could copy the layout and design of JASP (design decisions take quite some time, so this is a big time saver!)

so that would be one approach.

alexkyllo commented 9 years ago

Here's the ticket: https://github.com/jeroenooms/opencpu/wiki/Script-or-function-execution

You can post any R script to the OpenCPU public server, it will run it and give you back links to temporary files showing resulting console output, images, etc. I was able to get it to run your example script and give me a (temporary) link to this plot. I should be able to use that to get it to display this inside the CurateScience app:

Do you want to actually store the R scripts as text in CurateScience database, or do you want the user to host them somewhere and provide a URL link?

eplebel commented 9 years ago

great thats so awesome!!!

for now, just URL to R scripts, but if we have time it'd be cool to be able to store R scripts as text in CurateScience database!

alexkyllo commented 9 years ago

Here's an update on where I'm at right now:

Uploaded your example R script to AWS S3 so that I can link it to a study
Added a "Run R Script button" next to the download button
Wrote the javascript code to post your R script to the OpenCPU API and return the result (which is an array of temporary links to files containing the script's outputs)

I've spent 4 hours on this so far. You can see the code I'm working on here: https://github.com/ScienceCommons/www/compare/issue%23138

What I have left to do is to add a modal window to the page that pops up and displays the script's console output and plot image. Then I think you'll have a good lightweight, working demo. Once that's ready I'll deploy to staging to show you. I'll be surprised if the whole thing takes me more than 10 hours (depending how fancy you want it to look).

If you like where this is going, later I could change it to store the R script in the CurateScience database and allow users to edit the script and resubmit it to OpenCPU, to make it more interactive.

Screenshot: screenshot from 2015-08-30 15 07 01

eplebel commented 9 years ago

Wow, this is looking very cool! I'm quite excited!

alexkyllo commented 9 years ago

I put another 3 hours into this today, and it's pretty close to being ready. I implemented a modal window that pops up when you click the "Run R Script" and shows a spinner while waiting for the OpenCPU API call to come back:

screenshot from 2015-09-06 20 59 19

Then when the API call returns it shows the scatterplot in the modal window: screenshot from 2015-09-06 21 01 26

Obviously there's a little more work to do--I need to do some CSS work to resize the modal window appropriately for the image, and add the R console output text to the modal as well. I believe I can have this ready for you to demo at your 9/14 talk.

eplebel commented 9 years ago

Very very cool! Is it possible at this point to push those changes to staging so I can play around with it a little bit?

alexkyllo commented 9 years ago

Ok, I worked on it a little more and pushed to staging so you can check it out.

Here's an example: https://curatescience-staging.herokuapp.com/beta/#/articles/335261

screenshot from 2015-09-08 21 50 52

A few notes:

The script is wrapped in identity(x = {}) because that's the way to get the OpenCPU server to accept arbitrary scripts as opposed to functions--by passing the script as an argument to the identity function. I will add code to hide this detail from the user.
I am planning to add a button to resubmit the script text to the server and get updated results.
I haven't written any error handling code so I'm not sure what will happen if invalid R code is submitted or if we get some error from OpenCPU server.

Take a look and let me know what you think!

eplebel commented 9 years ago

Wow this is awesome!!! I can successfully run the "lebelvessstudy1.R" R file I provided, however I haven't been able to make any other simple R files to work. For example:

test2.R https://osf.io/jg9yt/?action=download&version=1 dd=rnorm(100,100,10) mean(dd)

test3.R https://osf.io/zs436/?action=download&version=2 x1 <- rnorm(20,4,2) x2 <- rnorm(20,6,2) x3 <- rnorm(20,8,2) vioplot(x1, x2, x3, names=c("4 cyl", "6 cyl", "8 cyl"), col="gold") title("Violin Plots of Miles Per Gallon")

i added these to the first study in https://curatescience-staging.herokuapp.com/beta/#/articles/335205 (Bargh & Shalev, 2012, Study 1a)

UPDATE: I just discovered that the same file "lebelvessstudy1.R" linked from the OSF website (https://osf.io/zc82h/?action=download&version=1) rather than amazon s3 (https://s3.amazonaws.com/www.curatescience.org/lebelvessstudy1.R), also doesn't execute, so the problem is probably related to this. See "lebelvessstudy1_fromOSF.R" from the demo article you created (https://curatescience-staging.herokuapp.com/beta/#/articles/335261)

eplebel commented 9 years ago

UPDATE 2 : I am able to run those simple R files if I host them on the amazon s3 bucket (i.e., https://s3.amazonaws.com/www.curatescience.org/test2.R , https://s3.amazonaws.com/www.curatescience.org/test3.R)!!

ScienceCommons / api

In-browser execution of analyses for studies with available data #138