cogtoolslab / cognitive-ai-benchmarking

cognitive-ai-benchmarking
MIT License
3 stars 4 forks source link

Cognitive-AI Benchmarking (CAB)

Project Template for Implementing Human Behavioral Experiments

All Contributors

  1. Overview
  2. Implementing your experiment
  3. Installation
  4. Database organization
  5. Integration with experiment platforms
  6. Contributors

Overview

The purpose of this repo is to provide a starting point for researchers planning to conduct a Cognitive-AI Benchmarking (CAB) project. A CAB project will typically combine three elements: (1) stimulus generation; (2) human behavioral experiments; (3) analysis of behavioral data and comparison to model outputs.

This repository provides example code to setup and run the Object Contact Prediction (OCP) task on the dominoes scenario of the Physion dataset.

The central concepts in this repo

When working on a project, you will oftentimes run many different experiments that are related to each other. We propose a way of thinking about these related experiments that makes keeping track of them easy.

At the top of the hierarchy is the project—for example Physion. This corresponds to a repository.

There are datasets—for example one particular scenario from the Physion dataset, eg. dominoes.

The questions that we might ask of these datasets might change: we might ask whether people can predict the outcome a physical interaction (Object Contact Prediction Task, OCP) or whether they find the same video interest. This is what we call a task. Each task will usually have a different client front end in the experiments/[task] directory.

A particular experiment is a combination of a dataset and a task. For example, in this repository we show the dominoes_OCP. The convention for naming experiments is to use the dataset name followed by a _ followed by the task name. Which stimuli are passed to a task are usually determined by an URL parameter when the experiment is loaded in the user's browser.

For each experiment, there are small changes that the researcher might make, for example showing the videos for longer. These different versions of an experiment are called iterations.

Concept Example Correspondence
project Physion Repository, name of database ([proj]_input,[proj]_output)
dataset dominoes lives somewhere else
task OCP subfolders of experiments/
experiment dominoes_OCP collection in [proj]_input and [proj]_output database
iteration iteration_1 field of record in database

Implementing your experiment

To implement your own experiment, we suggest that you fork this repository and then adapt the example code provided to your purposes.

Preparing and running an iteration of your experiment is involves the following steps:

1. Prepare the videos or images—stimuli/

This repo assumes that you have already generated the images or videos that are being shown to the participants elsewhere. Use stimuli/upload_to_s3.py to upload your stimuli to S3 (for an usage example, see stimuli/stimulus_setup_example.ipynb).

2. Design your task user interface—experiments/

experiments/ contains the front end code for your experiment. A folder corresponds to a particular task (ie. Object Contact Prediction Task). Adapt the front end code in 'setup.js as well as the jsPsych plugins to your particular task. Check out this README.

If you want to see demo of the front end code, launch experiments/OCP_local/index.html using a web browser from your local machine.

3. Create session templates—stimuli/

Session templates are entries in the database that determine the precise order in which a participant will be shown the stimuli. These are created using stimuli/stimuli/stimulus_setup_example.ipynb, which also points to code to upload them to the MongoDB database that the experiment is served from.

4. Launch your experiment on a web server—app.js

app.js is the main entry point for your experiment. It is responsible for serving the experiment and handling the communication between the experiment and the participants.

To serve the experiment, run node experiments/app.js.

For development purposes, you can run node experiments/app.js --local_storage to run the experiment without access to a database. For more information, see this README.

5. Test your experiment

Validate data input

Once you launch the experiment, test it out and verify that your stimuli are being read in properly. Do this by checking the experiment in the browser.

Validate data output

Next, you will want to verify that all trial metadata and response variables are being saved properly. Use the analysis tools outlined in step 7 to make sure that your data is being saved properly.

6. Post your experiment to a recruiting platform (e.g., Prolific)

Publish your experiment and watch the data roll in!

7. Fetch and analyze the behavioral data—analysis/

analysis/ contains the code for downloading the behavioral data from the database.

Installation

Configuration

To configure your environment for using CAB, you will need to create a config file called .cabconfig. The purpose of this file is to define variables that apply to all of your CAB projects (e.g., username and password to access the mongo database). By default, this config file should be saved as a hidden file in your home directory, with file path HOME/.cabconfig. If you want to store this file in a different location, you can specify the path by setting the enviroment variable CAB_CONFIGFILE to the desired path.

Here is an example of a .cabconfig file, which follows the INI file format.

[DB]
password=mypassword #required
username=myusername #optional, default if unspecified is "cabUser"
host=myhost #optional, default if unspecified is 127.0.0.1
port=myport #optional, default if unspecified is 27017

Client-side tools

Server-side tools

Database organization

A mongoDB instance is ran by an organization (ie. your lab). For each project, there are two databases: [proj]_input and [proj]_output. In the [proj]_input database (for stimuli, what is shown to the user), each collection determines a set of stimuli in a certain order that can be shown to a user ("sesion template"). While running an experiment, this database will only be read from.

The data that is collected during an experiment goes into the [proj]_output database (for responses, which we get from the user). There, each document corresponds to a single event that we care about, such as the user giving a single rating to a single video. Each document contains field that allow us to group it into experiments and iterations, etc. While running an experiment, this database will only be written into.

Integration with experiment platforms

Prolific

Be aware that the Prolific_ID uniquely identifies a single person and is therefore personally identifiable data and needs to be treated confidentially. It must not be made publicly available.

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Felix Binder

🚧 🧑‍🏫

Yoni Friedman

💻

Dan Yamins

📋

Thomas O'Connell

💻

Haoliang Wang

💻

Justin Yang

💻

Robert Hawkins

🔧

Judy Fan

💡

This project follows the all-contributors specification. Contributions of any kind welcome!