Brown-BIOL2430-S04-Fall2015 / syllabus

3 stars 0 forks source link

Syllabus

Interacting With Data

Brown University, Fall 2015
BIOL2430-S04 (CRN:14763)
Topics in Ecology and Evolutionary Biology

Fridays 1-2:50p
Khoo Multimedia Lab (Room N320), Granoff Center

Instructor: Casey Dunn
Office hours: Monday 1:00-2:30PM, Room 301, Walter Hall (80 Waterman St.)
Prepend the subject line of all course related emails with "data: "

Science is becoming more data intensive, and at the same time new tools are allowing scientists to interact with data in new ways. This seminar will explore the potential impacts of these new ways of interacting with data on the practice of science, how these approaches can be used most effectively (with an emphasis on design and human perception), and introduce students to some tools that embody these changes, including version control (https://git-scm.com/), executable manuscripts (http://yihui.name/knitr/), and interactive visualizations (http://d3js.org/).

Please complete the survey if you register for or intend to sit in on the seminar.

This course is organized with github education tools.

Class format

Classes will consist of discussions, labs that examine particular tools tools, and student presentations. The schedule includes topics for each class, and conversation points to seed class discussions. There will be a strong focus on human perception as it relates to insight and on principles of design.

After the first meeting, and prior to the project presentations, the discussion for each class will be led by one or more students. These students will meet with me during office hours on the Monday preceding the class to map out the plan for the discussion.

Course site

All materials for the course, including the syllabus, are available at the course site. The syllabus will be updated as the course progresses, please check it weekly. Please submit suggestions and corrections for the class via the issue tracker.

Projects

Each student will create an interactive analysis/visualization based on their own work, publicly available data, or a published scientific paper. This project will presented in class at the end of the course.

Final projects will be developed and submitted in a git repository. Please fork the boilerplate repository for the assignment, and follow section 3 of these instructions. After you fork the repository, please enable the issue tracker in the repository settings so that others in the class (including the professor) can provide feedback.

The preferred approach is to work on your final project in a public repository to make it easy for everyone to see it. If you have unpublished data that you don't want to put in a public repository, please talk with me and we'll come up with a solution.

Reading

Reading includes book chapters, online resources, and videos to be watched ahead of class. The dates the readings will be discussed in class are listed in the schedule, but some will be useful to you much earlier as you work on your projects. In addition, the reading load is very uneven. On light weeks, it is good to get a jump on reading for future weeks.

Tufte, ER (2001). The Visual Display of Quantitative Information, 2nd edition. amazon

Murray, S (2013). Interactive Data Visualization for the Web. online

Haddock, SHD and CW Dunn (2011). Practical Computing for Biologists. amazon

Schedule

September 11 - The practice of working with data and the landscape of scientific publishing

Reading: Murray - chapters 1, 2 Assignment: In the next couple days, use the issue tracker to submit a visualization or two that you particularly like.

Intro to class, description of final projects

Investigators interact with data in several ways:

Designing analyses, implementing analyses, and running analyses are often treated as different tasks. In many studies, the investigator moves data through each stage of analysis by hand, which takes a long time and is error prone. Automated and interactive analyses separate the design/implementation of analyses from running analyses, and make running analyses very easy and reliable. This means that you can repeatedly run analyses before you collect your data (using simulations or other datasets), as you collect your data (to check data quality and assess how many more data are needed), and after you collect your data (to refine and extend analyses). This doesn't just speed up analyses, it fundamentally changes the way they are approached.

There used to be a small number of models of scientific publication, now there are many.

These models vary in a few key dimensions:

Other recent developments:

What are missing publication models?

September 18 - Version control with git; Survey of visualizations

Reading - Haddock and Dunn, chapter 4 and new chapter

Walk through git example

Intro to markdown

Discuss participant-submitted visualizations

September 25 - Data wrangling; web fundamentals

Reading - Tidy Data- http://vita.had.co.nz/papers/tidy-data.html ; Haddock and Dunn, chapter 1-3, pages 255-260; Murray chapter 3

Before class, install:

Tidy data

See Haddock & Dunn Figure 15.1 for examples of messy and tidy data.

Key points from Wickham's Tidy data paper:

Regular expressions

See regex folder.

Web fundamentals

To view web sites locally, rather than just double click the html file it is best to run them through a web server. This makes sure that javascript etc renders correctly. The simplest is python's simple server:

cd website_dir/
python -m SimpleHTTPServer

Where website_dir is the directory with your site files. Once it is running, enter the url (eg http://localhost:8000/) into your prowser to see the rendered page.

Download the example code for the Murray book expand it, then cd to the folder and launch the simple server. Explore the examples in your browser.

October 2 - Executable manuscripts

Reading: Murray chapters 2,4,5,6

Before class, install:

The two topics today have a lot in common. We embed R code in markdown documents that changes the document according to data when executed by knitr. We embed d3 javascript in html documents that changes the document according to data when executed by the browser.

Executable manuscripts

Markdown

Executable manuscripts with knitr

D3 continued

There are a variety of great online courses for learning javascript. If you don't have experience with javascript, check them out. See, for example, the courses at code agademy and code school.

The basics of drawing with data.

October 9 - Principles of design and static data visualization

Reading: Tufte (the whole book); Haddock and Dunn chapters 17-19

Principle of design

Visualization is the act of mapping data to aesthetic properties. The principles of design clarify which aesthetic properties we have to work with.

Some examples:

Tufte discussion

Different participants will discuss different chapters:

  1. Robert Lamb, Cat Munro
  2. Chris Arellano, Becca Wang
  3. Tyler Dae Devlin, Jack Diedrich
  4. Xiaojun Meng
  5. Alejandro Damian Serrano, Carlos Silva
  6. Yi (Jamie) Zhang, Bianca Brown
  7. Joaquin Nunez
  8. KC Cushman, Stephen Rong, Daniel Kunin
  9. Denise Yoon, Adam Spierer, Yun-hsuan 'Leslie' Lai

Counter point - Jer Thorp: I have millions of pixels

Raster vs. vector

Further discussion

Thoughts:

Best practices for modern media:

October 16 - Tufte continued, d3 continued

Reading: Tufte (all), Murray chapters 7-9.

Go through remaining Tufte chapters

Overview of scales, axes, and transitions in d3. Exercises to manipulate code, starting with iris.

October 23 - Interactive data visualization

Reading: Murray (the whole book); Shneiderman 1996 - "Visual Information-Seeking Mantra: overview first, zoom and filter, then details on demand."

Watch in advance of class:

In class:

Go over peoples' exercises.

Dynamic interactions:

A spectrum of approaches:

Exploration needs to be very unconstrained, exposition requires that the author direct the audience perspective through constraint. The extreme of constrained dynamic perspective is a video.

Imagine a VR movie without any constraint, where the audience could roam anywhere they like. They would be far from all the key action and have no idea what the movie was "about". Maybe the best exposition is fully constrained. These trade-offs are well illustrated by http://www.fallen.io/ww2/ .

Tropes - zoom and enhance.

October 30 - Interactive data visualization continued

Watch in advance of class:

Too busy to improve

Some other videos:

November 6 - Guest lecture, final project

Guest lecture - Mark Howison.

November 13 - Virtual Reality, Augmented Reality, Open lab to work on projects

(Pick your) reality

An introduction to Google Cardboard, including existing development tools.

VR does a few things:

Augmented reality:

Collaborating on projects with git

You can craft special urls to load html files in git repos as web pages, eg:

https://github.com/antropoteuthis/finalproject/blob/master/ISCPhyloecospace.html # github url
https://rawgit.com/antropoteuthis/finalproject/master/ISCPhylomorphospace.html   # as web page

A couple ways to propose changes/ fixes/ suggestions:

November 20 - Open lab to work on projects

Guest lecture by Max Leiserson about his project MAGI.

Please complete the first draft of your readme, and come to class prepared to talk for five minutes aabout the goals, status, and current challenges of your projects.

December 4

Project presentations

December 11

Visit by Sohini Ramachandran.

Project presentations

Other things (further reading, stuff that doesn't fit cleanly into above topics)

Other tools

Troubleshooting in d3

Useful javascript code to supplement d3