cmiller112000 / ud-datavis

udacity data visualization P6
0 stars 0 forks source link

ud-datavis

Udacity - Data Analyst Nanodegree - Project 6 - Data Visualization

Summary

I am examining the average arrival delays by select major airlines in select major US destination cities between the years 2000-2008 by day of week. I want to examine which days of the week experienced the most and least arrival delays.

What I found was 2000 seemed to be the worse for arrival delays with several airlines having delays on average over 10 minutes, United being the worse with over 20 minute delays.

From 2000 - 2003, you see a steady improvement in arrival delays, with even the worse offenders - Continental and Delta Airlines - coming in under 6 minutes on average. Unfortunately, 2004 - 2008 saw a continual increase for all airlines except Southwest Airlines, with several airlines averaging between 10 - 18 minutes.

From a day per week standpoint, all airlines seemed to have their lowest delay times on Tuesdays and Saturdays, and their highest delay times on Thursdays and Fridays. Which makes sense based on the typical busy travel days matching this finding.

Running the Data Visualization

Run Data Visualization!

Design

Dataset - Flight Arrival and associated supplemental data

I have taken the Flight data from http://stat-computing.org/dataexpo/2009/the-data.html/ and its supplemental data (carriers.csv/airports.csv) and combined data from the 3 sources. I then filtered down to a small subset of carriers and destination cities as listed below it. The reason for filtering the data was due to the large dataset (over 2GB even filtered). From performance posts I found researching dimplejs and d3js, I've seen posts saying 11MB as being excessive. Instead I pre-processed the data using the append_csv.py script and calculated the average daily arrival delay by carrier, airport, day of week and year. I was originally working with a csv file, but in researching performance issues, found the suggestion of using a json input file to limit the parsing phase required from converting csv to a json object:

Code Description
AA American Airlines Inc.
CO Continental Air Lines Inc.
DL Delta Air Lines Inc.
UA United Air Lines Inc.
WN Southwest Airlines Co.
State Cities
"NY" ["New York"]
"IL" ["Chicago"]
"TX" ["Dallas","Dallas-FtWorth","Houston","Austin"]
"CA" ["Los Angeles", "San Francisco"]
"GA" ["Atlanta"]
"FL" ["Miami", "Orlando"]
"MA" ["Boston"]
"VA" ["Arlington","Chantilly"]

I added long/lat data as well as merged the airport and carrier name information into a single flightdelays.json file with the following fields:

Field Name Field Description
Dest long longitude of destination airport
Dest lat lat of destination airport
Dest airport airport name of destination
ArrDelay Arrival Delay in minutes
UniqueCarrierName Air Carrier Name
DayOfWeek Mon, Tues, Wed, Thu, Fri, Sat, Sun
Year 2000 - 2008

Data Visualization

I have developed this visualization using primarily dimple.js with some d3.js tweaks based on example visualizations from http://www.dimplejs.org, in particular the bubble chart and interactive legend examples.

I created a dimple story board which is animated by year from 2000 - 2008. The animation can be paused by selecting the year you want to pause on. You can restart animation by selecting the year a second time.

I also used an interactive legend by airline carrier for filtering the data points displayed, and I also created a 'chart' to handle filtering of the aggregated data by major city airports.

Final Release 2 - Fixes Per Udacity Reviewer Feedback - GIT Tag: Final_Release_2

Note: Did not change the d3 and dimple js library references to us the internet links as they were not working properly and timing out

Final Release - Fixes - GIT Tag: Final_Release

Release 2 - Fixes - GIT Tag: Draft_Release_2

Release 1 - Issues - GIT Tag: Draft_Release_1

Feedback

Questions for Reviewers:

Please answer by creating an issue on my github repository following the example issue under my name:

https://github.com/cmiller112000/ud-datavis/issues/

Final Release Responses

Udacity Reviewer

https://review.udacity.com/#!/reviews/36166

Reviewer Comments Awesome Job!. Javascript is well implemented, good use of semicolons and indentation.

However there are some issues with the HTML and how javascript libraries are call. Following I review the different issues:

Javascript libraries import: Instead of working with d3 local files, you can simple call d3 library from their website (see line below). Please have a look at this link for further information.

https://www.dashingd3js.com/d3js-first-steps

DOCTYPE: this line must be included in order to allow browser to properly render the file, more info here

http://www.w3schools.com/tags/tag_doctype.asp

Encoding: for the browser to load the required chart set, you need to include the line below, see more info here:

http://www.w3schools.com/html/html_charset.asp

html content: it must be included in the body, please have a look at this link for a reference of a html template.

http://www.w3schools.com/html/html5_intro.asp

Once you edit your file, you can test your html using this powerful tool

https://validator.w3.org/

Reviewer Comments This is a great visualization, you were able to include a lot of information and still make it look great. By selecting airlines, airports and years, viewers can really do a deep exploration of the data. Your d3 code is coded well, and you got some great feedback, well done!. When I look at the chart, I understand how average delay times behave along the week. But that's really an exploratory visualization rather than explanatory. What I can't tell from this plot is what drives the average delay times along the week. In your summary I can read: "From a day per week standpoint, all airlines seemed to have their lowest delay times on Tuesdays and Saturdays, and their highest delay times on Thursdays and Fridays. Which makes sense based on the typical busy travel days matching this finding.", so it seems delays are related with average flights per day. This is actually the key I miss in your visualization. By adding this piece of information your visualization now becomes explanatory, users can now understand why delay times behave in such way. STEPS TO PASS THIS SECTION: Incorporate the average flight number per day in your chart.

Release 2 Responses

Udacity Discussion Forum Feedback

https://discussions.udacity.com/t/request-project-6-feedback-average-flight-delays-for-select-airlines-2000-2008/27856/7

Individual Feedback Logged to GitHub Project:

https://github.com/cmiller112000/ud-datavis/issues

Charlie1d

Hi @cheryl_592988902, thanks for posting your latest version. I've taken a look and so I'll post a few thoughts on here to encourage more discussion! I hope you don't mind that I've not posted on GitHub.

andrew_37796816420h

Hi @cheryl_592988902, I agree with the comments from @Charlie. In addition

Nice chart!!

Great!!!

Yohann16h

Hi @cheryl_592988902,

Nice chart, the design is very good as well as the different transitions

I hope this helps !

Kind Regards,

Yohann

Release 1 Responses

Shirley McAdams

Response

regarding the disappearing data, fixed that and will be providing a new release later today or tomorrow.

Cara Miller

Response

new version is hopefully much clearer (changed the bubble chart to a line chart with line markers), hopefully this will make the day to day relationships more clear. Regarding the yellow dots remaining in same spot, if you notice, the scale changes from year to year, that may be why it appears they are remaining the same. As for the purpose of the year over year, it makes it possible to see improvement and/or degradation in arrival times over time.

as for the yellow/orange color being too close, I changed the yellow to a light purple, so hopefully its easier to distinguish the different lines.

Alan McAdams

Response

Thanks Alan, good feedback! I have a new version I will be uploading later today or tomorrow (waiting for feedback from class peers). This new version fixes the disappearing data issue, and makes the airport filter clearer and easier to read. I also changed the bubble graph to a line graph with line markers so that the day to day differences are more obvious. I liked the bouncing balls, but it didn't make that relationship very clear.

I haven't figured out how to keep the animation paused when filtering by carrier or airport yet, still working to figure out how to do that.

JoAnna McAdams

Response

Thanks Joey! good feedback.

I have a new release coming later today or tomorrow that fixes the airport selector and a few other issues (like disappearing data). Hopefully that will make it clearer. While I liked the look of the bubble chart, I changed it to a line chart with line markers. It makes the day to day relationships much clearer.

Re: "How to keep raw data integrity in check - i.e.: one time meaning a plane pulls away from the gate, or actually takes off?"

I'm not clear on what you are asking? The raw data I based this on had multiple 'delay' timings and some (but limited) cause indicators. However, the data set itself, even filtering down to just these carriers and airports was till almost 2GB, and would never load in the browser using the tools I've been given. So I decided to just concentrate on the average arrival delay, thinking from a consumer standpoint, that is what most people would care about. I definitely see where the airlines or regulation industry would care much more on drilling down on specific causes. Is that what you were referring to?

Follow-up Response

Yes, that is what I was referring to, and it was more industry related, but given each airline had their own criteria for the definition of on time... Well, what can you do to control that?

I am really impressed with how you tamed THAT MUCH data in one file. Very nice!

Resources