Open eplebel opened 9 years ago
hi @rubenarslan, just to fill you in a bit, this was my earlier suggestion:
although it would certainly be nice to perform the analysis in the browser, it would seem to me to be a lot less work, and easier for reviewers, to let reviewers download the data and use whatever tool they feel comfortable in. reviewers could then upload the R code, the SPSS syntax, or the JASP analysis file - then people could review the reviewer.
having said that, i certainly am sympathetic to @eplebel's desire to reduce the barriers to participation
Reducing the barriers to participation is precisely the goal here. Users indeed currently do have the ability to download the data for local analyses, and we imagine for many users this will be good enough and/or the preferred method for them. However, for some users (who may not have as much time and/or incentive to independently verify someone else's data), it is very valuable to allow such users the ability to quickly and easily do these re-analyses directly in the browser without having to download the data and load it in a statistical software etc.
So although you can't use JASP within the web browser (a deliberate decision was made for it to not be a web application), one option would be to use the JASP statistics engine. JASP is made up of two parts, the user interface and the statistics engine. The UI part sends requests to the engine, asking it to perform the analyses. The engine runs the analysis, and passes the results back to the UI.
The analyses that the engine provides are nice, because:
a) they are neatly contained; (as opposed to most of the time in R, where doing a complete ANOVA [robustness tests, contrasts, test of assumptions, post hoc, plots, etc.] requires working with ~7 packages)
b) they are easy to use; (i don't want to do an RM ANOVA in R)
c) the results generated are JSON which map neatly to APA formatted HTML tables (the results panel to the right of JASP is all HTML and JS) that you can display in a web page
So the JASP statistics engine would be a component worth considering. It wouldn't be ready for your application as is, it would need some work to make it work outside of JASP. I would need to think about it.
One concern would be security; some of the analyses in the JASP engine are written in R, but I think that R is almost impossible to sandbox (compared to other languages) (which is why I would be very shy about any solution that allows for the execution of arbitrary R code - perhaps opencpu.org have solved this, but I have my doubts)
R was primarily designed with the local user in mind, security restrictions and unpredictable behavior have not been considered a major concern in the design of the software.
Sandboxing in this context is a somewhat informal term for creating an execution environment which limits capabilities of harmful and undesired behavior. As it turns out, R itself is not very suitable for implementing such access control policies
from here: http://www.jstatsoft.org/v55/i07/paper
Of course, this leaves the user interface - which would need to be written from scratch in HTML, but you could copy the layout and design of JASP (design decisions take quite some time, so this is a big time saver!)
so that would be one approach.
Here's the ticket: https://github.com/jeroenooms/opencpu/wiki/Script-or-function-execution
You can post any R script to the OpenCPU public server, it will run it and give you back links to temporary files showing resulting console output, images, etc. I was able to get it to run your example script and give me a (temporary) link to this plot. I should be able to use that to get it to display this inside the CurateScience app:
Do you want to actually store the R scripts as text in CurateScience database, or do you want the user to host them somewhere and provide a URL link?
great thats so awesome!!!
for now, just URL to R scripts, but if we have time it'd be cool to be able to store R scripts as text in CurateScience database!
Here's an update on where I'm at right now:
I've spent 4 hours on this so far. You can see the code I'm working on here: https://github.com/ScienceCommons/www/compare/issue%23138
What I have left to do is to add a modal window to the page that pops up and displays the script's console output and plot image. Then I think you'll have a good lightweight, working demo. Once that's ready I'll deploy to staging to show you. I'll be surprised if the whole thing takes me more than 10 hours (depending how fancy you want it to look).
If you like where this is going, later I could change it to store the R script in the CurateScience database and allow users to edit the script and resubmit it to OpenCPU, to make it more interactive.
Screenshot:
Wow, this is looking very cool! I'm quite excited!
I put another 3 hours into this today, and it's pretty close to being ready. I implemented a modal window that pops up when you click the "Run R Script" and shows a spinner while waiting for the OpenCPU API call to come back:
Then when the API call returns it shows the scatterplot in the modal window:
Obviously there's a little more work to do--I need to do some CSS work to resize the modal window appropriately for the image, and add the R console output text to the modal as well. I believe I can have this ready for you to demo at your 9/14 talk.
Very very cool! Is it possible at this point to push those changes to staging so I can play around with it a little bit?
Ok, I worked on it a little more and pushed to staging so you can check it out.
Here's an example: https://curatescience-staging.herokuapp.com/beta/#/articles/335261
A few notes:
identity
function. I will add code to hide this detail from the user.Take a look and let me know what you think!
Wow this is awesome!!! I can successfully run the "lebelvessstudy1.R" R file I provided, however I haven't been able to make any other simple R files to work. For example:
test2.R https://osf.io/jg9yt/?action=download&version=1 dd=rnorm(100,100,10) mean(dd)
test3.R https://osf.io/zs436/?action=download&version=2 x1 <- rnorm(20,4,2) x2 <- rnorm(20,6,2) x3 <- rnorm(20,8,2) vioplot(x1, x2, x3, names=c("4 cyl", "6 cyl", "8 cyl"), col="gold") title("Violin Plots of Miles Per Gallon")
i added these to the first study in https://curatescience-staging.herokuapp.com/beta/#/articles/335205 (Bargh & Shalev, 2012, Study 1a)
UPDATE: I just discovered that the same file "lebelvessstudy1.R" linked from the OSF website (https://osf.io/zc82h/?action=download&version=1) rather than amazon s3 (https://s3.amazonaws.com/www.curatescience.org/lebelvessstudy1.R), also doesn't execute, so the problem is probably related to this. See "lebelvessstudy1_fromOSF.R" from the demo article you created (https://curatescience-staging.herokuapp.com/beta/#/articles/335261)
UPDATE 2 : I am able to run those simple R files if I host them on the amazon s3 bucket (i.e., https://s3.amazonaws.com/www.curatescience.org/test2.R , https://s3.amazonaws.com/www.curatescience.org/test3.R)!!
Given Curate Science's mission is to facilitate and incentivize the independent evaluation and verification of published scientific findings, a key feature to be implemented is allowing users to verify the reproducability of other researcher's study results within their browser. We want to make it as easy as possible (and then reward) researchers for checking the reproducibility of each other's results (i.e., reproduce the results reported in an article by executing the same analyses on the publicly available data). Hence, for studies that have available data, users will be able to analyze data within the web application and then leave a comment stating they endorse that the results are in face reproducible (which will activate logo that visually indicates this: ).
Initial specs for this feature can be found here: https://dl.dropboxusercontent.com/u/227724/In-browser%20analyses%20SPECS.docx
For now, the two main features will allow logged in users to:
Here's how the UI will look like, using an RStudio Server implementation:
This feature is very important given that it adds a lot of utility for users interested in verifying others' results, which is a growing interest (it's way easier to re-analyze someone else's data by being able to do it directly in-browser rather than having to download the data, then manually loading the data, etc.). This is consistent with the meta-science literature, which has argued that open science initiatives need to have sufficiently high expected utility that overcomes initial buy-in costs to engage with the platform (Buttliere, 2014, http://journal.frontiersin.org/Journal/10.3389/fncom.2014.00082/full)