briatte / ala

What got published in A List Apart, 1999-2016 – pings to @alistapart and @zeldman
1 stars 0 forks source link

What got published in A List Apart, 1999-2016

A short exercise in Web scraping.

HOWTO

  1. Run 00-init.r after installing its package dependencies.
  2. Run 01-data.r as many times as needed to collect all raw data and process it.
  3. Run 02-plots.r to generate summary plots.

DATA

Note: the small issues raised up by data collection might get fixed.

ala_data.csv

Contains information on all A List Apart articles published between 1999 (inception) and 2016:

A single article is missing (/article/xhtml), and A List Apart blog posts published between 2013 and 2015 are downloaded but excluded from the ala_data.csv dataset.

ala_refs.csv

Contains the edge list of article cross-citations:

Note that a few (7) articles do not show up as sources in the edge list because of HTML parsing errors. The problem is explained in detailed in this Stack Overflow post.

ala_tags.csv

Contains the general and specific topics of the articles:

Since general topics are also used to categorise articles, the parent and tag columns (parent and child) are sometimes identical.

CREDITS

NOTE

All A List Apart articles are Copyright © 1998–2017 A List Apart & Authors.

Please do not redistribute the raw data for this project.