everpub / openscienceprize

:telescope: Everpub - Making reusability a first class citizen in the scientific workflow.
Other
70 stars 20 forks source link

Contribute to and promote existing projects #2

Open betatim opened 8 years ago

betatim commented 8 years ago

One big problem is fragmentation, everyone invents their own thing. This project should try and work against that trend by contributing to existing solutions/open source projects.

One reason for fragmentation could be that people simply do not know about existing tools. This project should create some best practices/documentation/advice. This would spread the word about projects. More importantly lots of people using a 80% perfect tool will give that project momentum and let it get closer to 100% perfection. As opposed to people going away and building what they are missing somewhere else.

Daniel-Mietchen commented 8 years ago

How does your idea here differ from the Open Science Framework?

betatim commented 8 years ago

I once looked at OSF but never used it for serious work and since forgotten about it. Not quite sure why I never used it. Gut reaction is that it feels like it would tie you to their platform.

Do you have some experience with using OSF? It would be good to hear what people think of it.

For me the technical challenges of a state of the art reusable/open science workflow are "solved" (github, jupyter, snakemake, docker, zenodo, figshare, travis (read each of these as metasyntactic variables)). The hurdle now is to make the combination of these solutions usable by non geeks. IMHO this is mainly education, training materials, and creating examples/blue prints for others to copy. Then there is a small bit where you need to write some code to add syntactic sugar.

If you follow the advice/guidelines this proposal creates (docker + executable README) it is easy to create a publication that is much more than a static PDF. This bit doesn't exist in "production ready" state yet I think. As well as tools for post-pub review. So for that we'd need some code.

khinsen commented 8 years ago

For me the main unsolved technical challenge is high-performance computing. HPC environments are very restrictive. I haven't seen one yet that would let me run a Docker container. Those I use myself don't even allow outgoing network connections. HPC was a major design criterion for my own ActivePapers project, which is still the only existing workflow-style tool that I can actually use for my own work (studying protein dynamics by Molecular Dynamics simulation).

Most people in reproducible research tend to consider HPC an anomaly that will go away, and therefore not something to worry about. Whether or not this is true, I think restricted environments are going to remain important. Mobile devices, embedded systems, sandboxes inside bigger systems, etc. - there are lots of reasons for restrictions. Technological minimalism is thus not a bad principle for being future-proof - and it's also a big step towards suitability for non-geeks.

lukasheinrich commented 8 years ago

Hi @khinsen,

regarding using docker containers in an HPC context, let me point you to the Shifter project

http://www.nersc.gov/research-and-development/user-defined-images/

they seem to be accepting docker images (from which they derive their own custom image format) for use with HPC.

cranmer commented 8 years ago

I'll meet someone from COS tomorrow. There is a lot of nice stuff going on with OSF. You can fork an entire project, you can add integrations, the project is non-profit, and the code for OSF itself is on GitHub.

We use it for crayfis... but mainly as a collaborative tool (to collect talks, archive different versions of studies, it has a wiki, ...) and it has an interface that also works with less technical people (eg. drag and drop stuff, dropbox integrations, etc.)

They also have integrations with major data repositories (figshare, dataverse, etc.).

Having said all of that, my impression is that it focuses more on "reproducibility" in the sense of someone else follows your protocol to reproduce the experiment... and openness in that your an record your protocol first, then the data, then the analysis of the data, and you can package it all up and make it all open at once. I haven't seen much in terms of "reproducibility" (replicability) in the more computational sense.

However, they may be interested and I think it's reasonable to think how our tools could later be integrated into OSF so that they could provide these web services for OSF projects.

khinsen commented 8 years ago

@lukasheinrich Thanks for the pointer! I know there are initiatives for changing HPC, but it's a very conservative community in terms of everything except performance. The popularity of CentOS is a good indicator.

ctb commented 8 years ago

Note that we probably cannot explicitly partner with OSF because one of the judges, Brian Nosek, is the director. COI.

cranmer commented 8 years ago

I have to go teach, but I suggest we make reference to https://vida-nyu.github.io/reprozip/ in related projects.

Also, should adjust language a bit in the related projects section since I'm sure people working on those projects might object to some of those statements and it comes off like criticism with a broad brush.