chihacknight / breakout-groups

Breakout groups that meet at Chi Hack Night every Tuesday in Chicago
https://chihacknight.org/breakouts.html
95 stars 24 forks source link

SoQL Learning Group #132

Closed vingkan closed 5 years ago

vingkan commented 6 years ago

Overview

The Socrata Query Language (SoQL) is a powerful way to find answers to questions you have about the information in open datasets. In this learning group, we play fun games to learn how to use SoQL and explore open datasets from Chicago and beyond!

Every dataset on the city data portal supports SoQL queries. We created a simple interface to let you make requests. Once you get your feet wet with SoQL and open data, you'll have a valuable skill set for any civic tech project!

Everyone is welcome to join our sessions. We usually use a Mob Programming format, so that everyone who attends can participate in the discussion and write whatever amount of code they feel comfortable with.

Links

Facilitators

vingkan commented 6 years ago

To see the first SoQL learning activity, check out the one-time Ghostbusters breakout group (#130) from the ChiHackNight Halloween party!

vingkan commented 6 years ago

Tuesday, November 7th

Two Truths and a Lie with the COPA Dataset

The new COPA Dataset details complaints received by the Civilian Office of Police Accountability and its predecessor agency. In today's activity, we will pose two truths and one lie about the dataset to the group and use SoQL queries to figure out which is which!

"There are more than 16,000 complaints submitted by males."

"Excessive Force was the most common category of complaints in September 2017."

"COPA has received the same number of complaints from people ages 60-69 as from people aged 0-19."

You can start writing queries here!

Update: One set of solutions is here.

vingkan commented 6 years ago

Tuesday, November 14th

Shuffleboard with the Cook County Sweetened Beverage Distributors Dataset

Skills you learn in this learning group transfer to any data portal that uses Socrata! This week, we'll check out the Cook County data portal with the Sweetened Beverage Distributors Dataset. In today's activity, we'll play a game of shuffleboard... played with Socrata location queries instead of discs. In each round, there is a sweet spot number of distributors and we will work as a group to "capture" those points in one query.

Round 1: Select 23 McDonald's locations, using the Merchandise Mart as a center point.

Round 2: Select 17 distributors located on Cermak Road.

Round 3: Select 5 distributors in Skokie and 3 distributors in Evanston with no distributors from other cities.

You can start writing queries here!

stevevance commented 6 years ago

Your SoQL query tester and submitter tool is awesome!

vingkan commented 6 years ago

Tuesday, November 28th

Graph Attack with the Potholes Dataset

This week we'll play with data about everyone's favorite urban obstacle: potholes! You can find the Potholes Dataset on the City of Chicago data portal. In today's activity, we will learn how to write SoQL queries that provide the data for different kinds of graphs. You can do this to create quick visualizations for any dataset on Socrata!

Round 1: Create a pie chart of recent actions taken on pothole requests from October 2017.

chart 1

Round 2: Create a line chart of the number of potholes filled by creation date in October 2017.

chart 2

You can start writing queries here! Create the charts in this Google Spreadsheet. Update: One set of solutions is here.

vingkan commented 6 years ago

Tuesday, December 5th

20 Questions with the Crimes Dataset

This week we'll play 20 Questions! I'm thinking of a specific case from the city's dataset of crimes. The records go all the way back to 2001, so you will have to explore the dataset, ask interesting questions, and work together to figure out which one it is. Here are the rules:

  1. Only questions with a yes or no answer are allowed.
  2. Only four questions can be asked about each column of the dataset.
  3. The group must edit and run their query before asking another question.
  4. After each question, the person controlling the keyboard rotates.

You can start writing queries here!

derekeder commented 6 years ago

this group hasn't met in a while. closing for now.

vingkan commented 6 years ago

Tuesday, May 8th

20 Questions with the Food Inspections Dataset

Aaaaaand we're back for ChiHackNight's 300th episode! This week we'll play 20 Questions! I'm thinking of a specific record from the city's dataset of food inspections. As of today, there are over 160,000 entries in this dataset, so you will have to explore the dataset, ask interesting questions, and work together to figure out which one it is. Here are the rules:

  1. Only questions with a yes or no answer are allowed.
  2. Only four questions can be asked about each column of the dataset.
  3. The group must edit and run their query before asking another question.
  4. After each question, the person controlling the keyboard rotates.

You can start writing queries here!

derekeder commented 5 years ago

Closing for now as its inactive

vingkan commented 5 years ago

Tuesday, March 5th

20 Questions with the Traffic Crashes Dataset

After a long hiatus, we are back. This week we'll play 20 Questions! The topic will be the City of Chicago's open dataset of traffic crashes, released this past summer.

I'm thinking of a specific crash from main traffic crashes dataset. As of today, there are over 272,000 entries in this dataset, so you will have to explore the dataset, ask interesting questions, and work together to figure out which one it is. Here are the rules:

  1. Only questions with a yes or no answer are allowed.
  2. Only four questions can be asked about each column of the dataset.
  3. The group must edit and run their query before asking another question.
  4. After each question, the person controlling the keyboard rotates.

You can start writing queries here!

Congratulations to tonight's group for identifying the correct traffic crash after 19 questions! You can see their solution here.

vingkan commented 5 years ago

Tuesday, March 12th

Two Truths and a Lie with the West Nile Virus (WNV) Mosquito Test Results Dataset

Since this week's ChiHackNight is about public health in Chicago, we will explore the West Nile Virus (WNV) Mosquito Test Results Dataset! The City remains vigilant in the fight against diseases and this dataset provides a look into one under-appreciated initiative in that vein. In today's activity, we will pose two truths and one lie about the dataset to the group and use SoQL queries to figure out which is which. Come see what the buzz is all about!

"The City has not found mosquitoes that tested positive for West Nile Virus since September 2018."

"In 2016, most mosquitoes trapped by the City belonged to the species Culex restuans."

"Mosquitoes testing positive for West Nile Virus are most commonly found in the summer."

You can start writing queries here!

vingkan commented 5 years ago

I will be gone for the next two weeks. If someone else would like to run this group, please send me an email at v@hawk.iit.edu and I can get you up to speed. It requires a very small time commitment to prepare for each meeting.

I will be back on April 2nd!

vingkan commented 5 years ago

Tuesday, April 2nd

Weird Flex with the Chicago Energy Benchmarking Dataset

This week's activity will explore the Chicago Energy Benchmarking Dataset, which includes data about energy usage and efficiency of city buildings larger than 50,000 square feet. Although the buildings in this dataset represent less than 1% of all buildings in Chicago, they add up to roughly 20% of the total energy used by the city's buildings. Maybe your school, workplace, or residence is in the dataset!

Weird flex is a game where the group dissects a (fictional) claim to figure out how truthful it is. This week's weird flex is:

The Monadnock Building is the most energy efficient office building in Chicago!

Out of all Chicago office buildings with 50,000+ square feet, the three most energy efficient buildings achieved an Energy Star rating of 99 in 2017. As shown in the table below, of those three buildings, the Monadnock Building is the largest in square footage, so it claims to be the the most efficient of its peers.

Year Property Name Square Feet Energy Star Rating
2017 641 W Lake 112458 99
2017 954 W Washington Blvd 140000 99
2017 The Monadnock Building 501318 99

Some helpful links:

Let's find out how truthful this weird flex is...

You can start writing queries here!

vingkan commented 5 years ago

Not meeting tonight.

If someone else would like to facilitate a session, feel free to send me an email at v@hawk.iit.edu and I can get you up to speed! It requires a very small time commitment to prepare for each meeting.

I will be back on April 16th!

vingkan commented 5 years ago

Tuesday, April 16th

Two Truths and a Lie with the Rideshare Dataset

Last week, the City announced the release of three datasets about ridesharing in Chicago!

As of April 13th, the datasets contain 17.4 million trips and 4.81 million drivers and vehicles.

We will explore the trips dataset by figuring out which of these three statements are truthful:

Most rides start and end in the Near North Side community area.

On average, riders picked up in the Lincoln Square community area tip better than riders picked up in the Loop.

If you order a shared ride, you have a 30% or better chance of having the ride all to yourself.

You can start writing queries here!

Update: One set of solutions (plus a bonus query!) can be found here.

vingkan commented 5 years ago

Tuesday, April 23rd

Two Truths and a Lie with the Cook County Assessor's Modeling Data

Last week, the Cook County Assessor's Office (CCAO) announced the release of the data and code they use to estimate property values. This narrative provides more background about the datasets. The dataset called Modeling Data contains 101,000 records of property sales in Chicago, which the CCAO uses to inform their model's estimates.

We will explore the dataset by figuring out which of these three statements are truthful. There may be more than one lie!

The only properties with 10 or more full bathrooms are apartments.

On average, properties designed from an architect's plan sold for twice as much as properties designed from a stock plan.

Most properties sold for $1 were large lots.

You can start writing queries here!

Update: One set of solutions can be found here.

vingkan commented 5 years ago

I am not able to run this group because I am no longer in Chicago, but this page lists all of the previous activities for future use. Feel free to reach me at vingkan@gmail.com if you have any questions or want to restart the learning group!