Cognitive-AI Benchmarking (CAB)

Project Template for Implementing Human Behavioral Experiments

Overview
Implementing your experiment
Installation
Database organization
Integration with experiment platforms
Contributors

Overview

The purpose of this repo is to provide a starting point for researchers planning to conduct a Cognitive-AI Benchmarking (CAB) project. A CAB project will typically combine three elements: (1) stimulus generation; (2) human behavioral experiments; (3) analysis of behavioral data and comparison to model outputs.

This repository provides example code to setup and run the Object Contact Prediction (OCP) task on the dominoes scenario of the Physion dataset.

The central concepts in this repo

When working on a project, you will oftentimes run many different experiments that are related to each other. We propose a way of thinking about these related experiments that makes keeping track of them easy.

At the top of the hierarchy is the project—for example Physion. This corresponds to a repository.

There are datasets—for example one particular scenario from the Physion dataset, eg. dominoes.

The questions that we might ask of these datasets might change: we might ask whether people can predict the outcome a physical interaction (Object Contact Prediction Task, OCP) or whether they find the same video interest. This is what we call a task. Each task will usually have a different client front end in the experiments/[task] directory.

A particular experiment is a combination of a dataset and a task. For example, in this repository we show the dominoes_OCP. The convention for naming experiments is to use the dataset name followed by a _ followed by the task name. Which stimuli are passed to a task are usually determined by an URL parameter when the experiment is loaded in the user's browser.

For each experiment, there are small changes that the researcher might make, for example showing the videos for longer. These different versions of an experiment are called iterations.

Concept	Example	Correspondence
project	Physion	Repository, name of database (`[proj]_input`,`[proj]_output`)
dataset	dominoes	lives somewhere else
task	OCP	subfolders of `experiments/`
experiment	dominoes_OCP	collection in `[proj]_input` and `[proj]_output` database
iteration	iteration_1	field of record in database

Implementing your experiment

To implement your own experiment, we suggest that you fork this repository and then adapt the example code provided to your purposes.

Preparing and running an iteration of your experiment is involves the following steps:

1. Prepare the videos or images—`stimuli/`

This repo assumes that you have already generated the images or videos that are being shown to the participants elsewhere. Use stimuli/upload_to_s3.py to upload your stimuli to S3 (for an usage example, see stimuli/stimulus_setup_example.ipynb).

2. Design your task user interface—`experiments/`

experiments/ contains the front end code for your experiment. A folder corresponds to a particular task (ie. Object Contact Prediction Task). Adapt the front end code in 'setup.js as well as the jsPsych plugins to your particular task. Check out this README.

If you want to see demo of the front end code, launch experiments/OCP_local/index.html using a web browser from your local machine.

3. Create session templates—`stimuli/`

Session templates are entries in the database that determine the precise order in which a participant will be shown the stimuli. These are created using stimuli/stimuli/stimulus_setup_example.ipynb, which also points to code to upload them to the MongoDB database that the experiment is served from.

4. Launch your experiment on a web server—`app.js`

app.js is the main entry point for your experiment. It is responsible for serving the experiment and handling the communication between the experiment and the participants.

To serve the experiment, run node experiments/app.js.

For development purposes, you can run node experiments/app.js --local_storage to run the experiment without access to a database. For more information, see this README.

5. Test your experiment

Validate data input

Once you launch the experiment, test it out and verify that your stimuli are being read in properly. Do this by checking the experiment in the browser.

Validate data output

Next, you will want to verify that all trial metadata and response variables are being saved properly. Use the analysis tools outlined in step 7 to make sure that your data is being saved properly.

6. Post your experiment to a recruiting platform (e.g., Prolific)

Publish your experiment and watch the data roll in!

7. Fetch and analyze the behavioral data—`analysis/`

analysis/ contains the code for downloading the behavioral data from the database.

Installation

Configuration

To configure your environment for using CAB, you will need to create a config file called .cabconfig. The purpose of this file is to define variables that apply to all of your CAB projects (e.g., username and password to access the mongo database). By default, this config file should be saved as a hidden file in your home directory, with file path HOME/.cabconfig. If you want to store this file in a different location, you can specify the path by setting the enviroment variable CAB_CONFIGFILE to the desired path.

Here is an example of a .cabconfig file, which follows the INI file format.

[DB]
password=mypassword #required
username=myusername #optional, default if unspecified is "cabUser"
host=myhost #optional, default if unspecified is 127.0.0.1
port=myport #optional, default if unspecified is 27017

Client-side tools

jsPsych

Server-side tools

Database organization

A mongoDB instance is ran by an organization (ie. your lab). For each project, there are two databases: [proj]_input and [proj]_output. In the [proj]_input database (for stimuli, what is shown to the user), each collection determines a set of stimuli in a certain order that can be shown to a user ("sesion template"). While running an experiment, this database will only be read from.

The data that is collected during an experiment goes into the [proj]_output database (for responses, which we get from the user). There, each document corresponds to a single event that we care about, such as the user giving a single rating to a single video. Each document contains field that allow us to group it into experiments and iterations, etc. While running an experiment, this database will only be written into.

Integration with experiment platforms

Prolific

To post your experiment to Prolific, go to https://www.prolific.co and sign in using your lab/organization's account.
Click the New study tab to create a new study for your experiment. Here are the steps:
give your study a name (title field in the first line), remember that this name is visible to your participants, so please make this title easy to understand (don't use technical terms) and attractive (in order to recruit participants more efficiently).
the internal name (second line) should include some identifier of the experiment, e.g. BACH_dominoes_pilot1, please do not use very generic names like pilot1 because the messaging system only displays the internal name, so it’s hard to know who to poke about messages without diving into the study details.
To include the URL of your study, you can figure it out with the URL parameters (eg. https://cogtoolslab.org:8881/dominoes/index.html?projName=BACH&expName=dominoes_OCP&iterName=it1) and choose I'll use URL parameters for How do you want to record Prolific IDs, which will add additional URL parameters that tell us which participant is doing the study. Please run app.js and make sure that your study is accessible from the web.
Prolific will suggest a completion code—this can be added into setup.js to automatically accept participants who have finished it in Prolific. So please select "I'll redirect them using a URL".
For study cost, please pay attention to the minimum wage in your state.
Then simply open the Prolific study and watch the responses roll in!

Be aware that the Prolific_ID uniquely identifies a single person and is therefore personally identifiable data and needs to be treated confidentially. It must not be made publicly available.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Felix Binder} 🚧 🧑‍🏫	_{Yoni Friedman} 💻	_{Dan Yamins} 📋	_{Thomas O'Connell} 💻	_{Haoliang Wang} 💻	_{Justin Yang} 💻	_{Robert Hawkins} 🔧
_{Judy Fan} 💡

This project follows the all-contributors specification. Contributions of any kind welcome!

cogtoolslab / cognitive-ai-benchmarking

readme

Cognitive-AI Benchmarking (CAB)

Overview

The central concepts in this repo

Implementing your experiment

1. Prepare the videos or images—`stimuli/`

2. Design your task user interface—`experiments/`

3. Create session templates—`stimuli/`

4. Launch your experiment on a web server—`app.js`

5. Test your experiment

Validate data input

Validate data output

6. Post your experiment to a recruiting platform (e.g., Prolific)

7. Fetch and analyze the behavioral data—`analysis/`

Installation

Configuration

Client-side tools

Server-side tools

Database organization

Integration with experiment platforms

Prolific

Contributors ✨

cogtoolslab / cognitive-ai-benchmarking

readme

Cognitive-AI Benchmarking (CAB)

Overview

The central concepts in this repo

Implementing your experiment

1. Prepare the videos or images—stimuli/

2. Design your task user interface—experiments/

3. Create session templates—stimuli/

4. Launch your experiment on a web server—app.js

5. Test your experiment

Validate data input

Validate data output

6. Post your experiment to a recruiting platform (e.g., Prolific)

7. Fetch and analyze the behavioral data—analysis/

Installation

Configuration

Client-side tools

Server-side tools

Database organization

Integration with experiment platforms

Prolific

Contributors ✨

1. Prepare the videos or images—`stimuli/`

2. Design your task user interface—`experiments/`

3. Create session templates—`stimuli/`

4. Launch your experiment on a web server—`app.js`

7. Fetch and analyze the behavioral data—`analysis/`