expfactory / expfactory-python

python module for managing experiment factory javascript experiment files, batteries to deploy them to (eg, psiturk), and virtual machines to host the compilation of those things.
http://expfactory.readthedocs.org/
MIT License
4 stars 11 forks source link

proposed changes to client #133

Closed vsoch closed 7 years ago

vsoch commented 7 years ago

EDIT

The following was written anticipating a collaboration, but that doesn't seem to have worked out so I'm rolling this on my own, using more reproducible methods.


@teonbrooks this is mostly for you! I'm going to put some notes here for you to see when you get back, and for discussion.

Right now expfactory has two use cases. The docker (online client) which is pretty much useless for anyone but Poldracklab, and this is because we can't safely store others data, nor can we store credentials. The other use case (which likely does have users) is for local labs to run, meaning generating an experiment on the fly, or generating a static battery to serve. With these goals in mind, I want to first propose the following functionality for expfactory-python:

Experiments

What is an experiment?

An experiment is a github repo. It is static, meaning that it can be run if you start a webbrowser in the present working directory, and it's exp_id corresponds to its organization/folder name. We can think of the organization/repo name akin to a registry in docker. If the user doesn't specify one, the "default" is assumed to be 'expfactory-experiments` so this:

expfactory --run --experiment stroop

is really the same as

expfactory --run --experiment expfactory-experiments/stroop

Thus, to ask for a fork of that (@teonbrooks) I would do:

expfactory --run --experiment teonbrooks/stroop

How do we summarize experiments?

Given that each folder is an experiment, we simply have one base folder under the expfactory organization (likely [expfactory/experiments]()) that is cloned and contains all metadata for official experiments. We will want this to be done in an automated fashion - a single experiment has a repo, given that it passes testing, a PR is automatically sent to update the "current version" or add a branch, etc. to the metadata folder. The metadata folder would be a lot like the Docker standard library - a central place to look up where the actual experiments are, and their versions, etc.

How does a user interact with experiments?

The workflow will be similar to now:

However, instead of generating a temporary folder on the fly, what we would want to do is caching. A user will have a expfactory folder in their $HOME where experiments can be cached, and this essentially means github repos. For example, the stroop experiment would be stored like this:

/expfactory
    /github.com/
        /expfactory
              /stroop
        /teonbrooks
              /stroop

And this is an organizational approach similar to GoLang so that, if I wanted to work on the code for teonbrooks/stroop, I would do that via this base. This means that, for running experiments, the exp_id becomes now both the user/organization name AND the experiment. This also means that we have to ask a little bit more to the user at run time to generate / preview an experiment, if they don't want the default (expfactory) version. Then, we would have commands in the expfactory client that make them act just like a package manager, but for Github and experiments. Eg:

expfactory install teonbrooks/stroop
expfactory run teonbrooks/stroop
expfactory update teonbrooks/stroop

So this is the first part of the proposal

I will put my other suggested modifications in different issues, to be specific.

vsoch commented 7 years ago

I'm just going to do this myself then.

earcanal commented 7 years ago

I just wanted to outline my use cases as you're about to touch this, and I'm a real user :)

Local sequence

This is a variant of local battery, as I don't want to randomise the order of my tasks (#134). Using the shell script in #132 to work around this issue, my studies run nice and smoothly, with the slight problem that I have to ^c the script to move to the next experiment/survey. This is not an issue in the lab.

Online sequence

I'm about to start developing this use case. The limitations in local sequence will become an issue here for obvious reasons. I evaluated docker, and it's a little over-specified for my requirements. It looks fairly simple to push json results to a web app, and this is the approach I'm planning to investigate when porting my local sequence studies to run unattended online.

There's also some glue code to write to neatly organise task data by participant. I know that writing the json to a database would be neatest, but I expect I will start by writing CSV files (experiments) and json files (surveys) to a participant directory. This will allow me to use the same analysis code (R) I use to analyse locally collected data.

Very grateful if you're able to incorporate online sequence into your work, and happy to help. I will be working on a solution regardless with the aim of having something working by December 2017-January 2018.

vsoch commented 7 years ago

hey @earcanal ! Docker (or Singularity) would mean that you wouldn't need to think about these installation / dependency details - you would just run one command and the software /databases would be "frozen" in a sense for you to just get up and running. I work on quite a few open source projects so I can't make promises with regard to time, but I've definitely started on this and will put together some examples when they are ready!

vsoch commented 7 years ago

hey @earcanal - how / when do you input the participant ID for both online and local use cases? If you had to serve your own MySQL and just configure it with the local / online application, would that work for you?

earcanal commented 7 years ago

Awesome question! I don't, so I have to manually convert the auto-generated UIDs to participant numbers as part of my manual (horrible, proprietary) file renaming process! Almost anything would be better than this.

As an experimenter, (without me having to do anything) I would like each study to automatically generate a participant__id, starting from 1, and auto_incremented this for each new participant. I would like to specify a study_id (string) when configuring the study. I would like participant_id and study_id columns in data tables (database or files) for all study objects (experiments and tasks).

vsoch commented 7 years ago

Thanks this is great! And yes that's exactly what we used to do back in the days when I was an RA in a psychology lab. I'll make some default study_id that you can edit when you build your battery (container)

earcanal commented 7 years ago

User Story Mapping and Writing Effective Use Cases are all the best books :)