fdac19 / news

Annoucements for the cFundamentals of Digital Archeology class
GNU General Public License v3.0
1 stars 6 forks source link

Course Evaluation

Final Project Reports (Due Dec 9)

Similar to progress reports with additional sections:

Class on Nov Dec 2

Class on Nov 25

Class on Nov 22

Class on Nov 20

Class on Nov 18

Class on Nov 15

Class on Nov 13

Class on Nov 11

Class on Nov 8

Class on Nov 6

Class on Nov 4

Class on Nov 1

Class on Oct 30

Class on Oct 21, 23, 25

Class on Oct 16

Class on Oct 14

Class on Oct 11

Class on Oct 9

Class on Oct 7

Class on Oct 2-4

Class on Sep 29

Class on Sep 27

Class on Sep 25 (complete project proposals)

Class on Sep 23 (complete Miniproject2)

Class on Sep 20

Class on Sep 18

Class on Sep 16

Class on Sep 13

Class on Sep 11

Class on Sep 09

Class on Sep 06

Class on Sep 04

Class on Aug 30: Attend only if you need help with Practice0 face to face

Class on Aug 28

Class on Aug 26

Class on Aug 23

Class on Aug 21

Class video recordings

Information for remote participation via Zoom

Syllabus for "Fundamentals of Digital Archeology"

Simple rules:

  1. There are no stupid questions. However, it may be worth going over the following steps:
  2. Think of what the right answer may be.
  3. Search online: stack overflow, etc.
  4. Look through issues
  5. Post the question as an issue.
  6. Ask instructor: email for 1-on-1 help, or to set up a time to meet

Objectives

The course will combine theoretical underpinning of big data with intense practice. In particular, approaches to ethical concerns, reproducibility of the results, absence of context, missing data, and incorrect data will be both discussed and practiced by writing programs to discover the data in the cloud, to retrieve it by scraping the deep web, and by structuring, storing, and sampling it in a way suitable for subsequent decision making. At the end of the course students will be able to discover, collect, and clean digital traces, to use such traces to construct meaningful measures, and to create tools that help with decision making.

Expected Outcomes

Upon completion, students will be able to discover, gather, and analyze digital traces, will learn how to avoid mistakes common in the analysis of low-quality data, and will have produced a working analytics application.

In particular, in addition to practicing critical thinking, students will acquire the following skills:

Course Description

A great volume of complex data is generated as a result of human activities, including both work and play. To exploit that data for decision making it is necessary to create software that discovers, collects, and integrates the data.

Digital archeology relies on traces that are left over in the course of ordinary activities, for example the logs generated by sensors in mobile phones, the commits in version control systems, or the email sent and the documents edited by a knowledge worker. Understanding such traces is complicated in contrast to data collected using traditional measurement approaches.

Traditional approaches rely on a highly controlled and well-designed measurement system. In meteorology, for example, the temperature is taken in specially designed and carefully selected locations to avoid direct sunlight and to be at a fixed distance from the ground. Such measurement can then be trusted to represent these controlled conditions and the analysis of such data is, consequently, fairly straightforward.

The measurements from geolocation or other sensors in mobile phones are affected by numerous (yet not recorded) factors: was the phone kept in the pocket, was it indoors or outside? The devices are not calibrated or may not work properly, so the corresponding measurements would be inaccurate. Locations (without mobile phones) may not have any measurement, yet may be of the greatest interest. This lack of context and inaccurate or missing data necessitates fundamentally new approaches that rely on patterns of behavior to correct the data, to fill in missing observations, and to elucidate unrecorded context factors. These steps are needed to obtain meaningful results from a subsequent analysis.

The course will cover basic principles and effective practices to increase the integrity of the results obtained from voluminous but highly unreliable sources.

Prerequisites

Students are expected to have basic programming skills, in particular, be able to use regular expressions, programming concepts such as variables, functions, loops, and data structures like lists and dictionaries (for example, COSC 365)

Being familiar with version control systems (e.g., COSC 340), Python (e.g., COSC 370), and introductory level probability (e.g., ECE 313) and statistics, such as, random variables, distributions and regression would be beneficial but is not expected. Everyone is expected, however, to be willing and highly motivated to catch up in the areas where they have gaps in the relevant skills.

All the assignments and projects for this class will use github and Python. Knowledge of Python is not a prerequisite for this course, provided you are comfortable learning on your own as needed. While we have strived to make the programming component of this course straightforward, we will not devote much time to teaching programming, Python syntax, or any of the libraries and APIs. You should feel comfortable with:

  1. How to look up Python syntax on Google and StackOverflow.
  2. Basic programming concepts like functions, loops, arrays, dictionaries, strings, and if statements.
  3. How to learn new libraries by reading documentation and reusing examples
  4. Asking questions on StackOverflow or as a GitHub issue.

Requirements

These apply to real life, as well.

Teaming Tips

Evaluation

Other considerations

As a programmer you will never write anything from scratch, but will reuse code, frameworks, or ideas. You are encouraged to learn from the work of your peers. However, if you don't try to do it yourself, you will not learn. deliberate-practice (activities designed for the sole purpose of effectively improving specific aspects of an individual's performance) is the only way to reach perfection.

Please respect the terms of use and/or license of any code you find, and if you re-implement or duplicate an algorithm or code from elsewhere, credit the original source with an inline comment.

Resources

Materials

This class assumes you are confident with this material, but in case you need a brush-up...

Other

Databases
R and data analysis
Tutorials written as ipython-notebooks

GitHub