drguthals / Introduction-to-Data-Science

This repository contains and introduction to data science for High School students using Azure Notebooks, Python, and Azure.
3 stars 2 forks source link

Introduction to Data Science

This repository contains and introduction to data science for High School students using Azure Notebooks, Python, and Azure.

About

This curriculum assumes that you have none to minimal experience with coding or data science. This curriculum is estimated to take approximately four weeks to complete, with optional extensions and projects that go beyond the specific curriculum. The general outline of the curriculum is as follows:

  1. General Introduction to Azure
    In this module, you will get a general introduction to all of the tools that you will be using and the way Data Scientists think about solving problems.

  2. Introduction to Python for Data Science
    In this module, you will explore the basics of Python. Python is in the 4th most popular programming language used by professional developers (not just Data Scientists). This has risen from 7th in 2018.

  3. Introduction to NumPy
    NumPy is a crucial library for effectively loading, storing, and manipulating in-memory data in Python.

  4. Introduction to Pandas
    The pandas library in Python does a lot to make working with data--and importing, cleaning, and organizing it--so much easier that it is hard to imagine doing data science in Python without it.

  5. Manupulating and Cleaning Data
    Real-world data is messy. You will likely need to combine several data sources to get the data you actually want. The data from those sources will be incomplete. And it will likely not be formatted in exactly the way you want in order to perform your analysis. It's for these reasons that most data scientists will tell you that about 80 percent of any project is spent just getting the data into a form ready for analysis.

  6. Introduction to Machine Learning Models
    This module focuses on the prediction and classification tasks that data scientists use after their data is prepared. This can be the fun part, but remember that the majority of data science is actually getting the data to a point of being ready to be analyzed.

  7. Cloud Based Machine Learning
    Though Azure notebooks are run in the cloud, the machine learning models you explored in the previous module were only accessible within the notebook itself. Using Azure Machine Learning Studio, your models become accessible anywhere.

  8. Azure Cognitive Services
    ust as you created a web service that could consume data and return predictions, so there are many AI software-as-a-service (SaaS) offerings on the web that will return predictions or classifications based on data you supply to them. This can help step up your models without you having to do the majority of the code writing.

Getting Started

To get started with this curriculum, you will need a few free accounts:

  1. Sign up for GitHub to be able to ask questions.
  2. Sign up for a Microsoft Account using a .edu email address.
  3. Go to the Azure for Students offer and sign up with your .edu email address.
  4. Make sure you can sign in with your Microsoft Account (MSA) on Azure Notebooks.
  5. Make sure you can sign in with your MSA on Azure Machine Learning Studio.