Open anitsh opened 3 years ago
You access the sale prices for recently sold homes or have them appraised. Since you have this data, this is a supervised learning task. You want to predict a continuous numeric value, so this task is also a regression task.
House price prediction is one of the most common examples used to introduce machine learning.
Traditionally, real estate appraisers use many quantifiable details about a home (such as number of rooms, lot size, and year of construction) to help them estimate the value of a house.
You detect this relationship and believe that you could use machine learning to predict home prices.
Machine language models to determine house values
Step One: Define the Problem Can we estimate the price of a house based on lot size or the number of bedrooms? You access the sale prices for recently sold homes or have them appraised. Since you have this data, this is a supervised learning task. You want to predict a continuous numeric value, so this task is also a regression task.
Step Two: Building a Dataset Data collection: You collect numerous examples of homes sold in your neighborhood within the past year, and pay a real estate appraiser to appraise the homes whose selling price is not known. Data exploration: You confirm that all of your data is numerical because most machine learning models operate on sequences of numbers. If there is textual data, you need to transform it into numbers. You'll see this in the next example. Data cleaning: Look for things such as missing information or outliers, such as the 10-room mansion. Several techniques can be used to handle outliers, but you can also just remove those from your dataset.
Data visualization: You can plot home values against each of your input variables to look for trends in your data. In the following chart, you see that when lot size increases, the house value increases.
Regression line of a model
Step Three: Model Training
Prior to actually training your model, you need to split your data. The standard practice is to put 80% of your dataset into a training dataset and 20% into a test dataset.
Linear model selection As you see in the preceding chart, when lot size increases, home values increase too. This relationship is simple enough that a linear model can be used to represent this relationship.
A linear model across a single input variable can be represented as a line. It becomes a plane for two variables, and then a hyperplane for more than two variables. The intuition, as a line with a constant slope, doesn't change. Using a Python library
The Python scikit-learn library has tools that can handle the implementation of the model training algorithm for you.
Step Four: Evaluation One of the most common evaluation metrics in a regression scenario is called root mean square or RMS. The math is beyond the scope of this lesson, but RMS can be thought of roughly as the "average error” across your test dataset, so you want this value to be low.
The math behind RMS
In the following chart, you can see where the data points are in relation to the blue line. You want the data points to be as close to the "average" line as possible, which would mean less net error.
You compute the root mean square between your model’s prediction for a data point in your test dataset and the true value from your data. This actual calculation is beyond the scope of this lesson, but it's good to understand the process at a high level.
Interpreting Results
In general, as your model improves, you see a better RMS result. You may still not be confident about whether the specific value you’ve computed is good or bad.
Many machine learning engineers manually count how many predictions were off by a threshold (for example, $50,000 in this house pricing problem) to help determine and verify the model's accuracy.
Step Five: Inference: Try out your model
Now you are ready to put your model into action. As you can see in the following image, this means seeing how well it predicts with new data not seen during model training.
Terminology Continuous: Floating-point values with an infinite range of possible values. The opposite of categorical or discrete values, which take on a limited number of possible values. Hyperplane: A mathematical term for a surface that contains more than two planes. Plane: A mathematical term for a flat surface (like a piece of paper) on which two points can be joined by a straight line. Regression: A common task in supervised machine learning.
Additional reading The Machine Learning Mastery blog is a fantastic resource for learning more about machine learning. The following example blog posts dive deeper into training regression-based machine learning models.
How to Develop Ridge Regression Models in Python offers another approach to solving the problem in the example from this lesson. https://machinelearningmastery.com/ridge-regression-with-python/
Regression is a popular machine learning task, and you can use several different model evaluation metrics with it. https://machinelearningmastery.com/regression-metrics-for-machine-learning/
We view this as a supervised learning task because we have something we're trying to predict (house price) that we can manually find when we build our dataset.
In this example, we used a linear model to solve a simple regression supervised learning task. This model type is a great first choice when exploring a machine learning problem because it's very fast and straightforward to train. It typically works well when you have relationships in your data that are linear (when input changes by X, output changes by some fixed multiple of X).
Can you think of an example of a problem that would not be solvable by a linear model?
Linear models typically fail when there is no helpful linear relationship between the input variables and the label.
For example, imagine predicting the height (label) of a thrown projectile over time (input variable). You know the trajectory is not linear; it's curved. Any straight line you try to use to describe this phenomenon would be invalid for a large range of the projectile's trajectory.
Techniques do exist to modify your data so you can still use linear models in these situations. Such methods are out of scope for this course but are called kernel methods.
https://www.youtube.com/watch?v=XP4-FOvlxVs In this video, you saw how the machine learning process can be applied to an unsupervised machine learning task that uses book description text to identify different micro-genres.
Model used to predict micro-genres
Find clusters of similar books based on the presence of common words in the book descriptions.
You do editorial work for a book recommendation company, and you want to write an article on the largest book trends of the year. You believe that a trend called "micro-genres" exists, and you have confidence that you can use the book description text to identify these micro-genres.
By using an unsupervised machine learning technique called clustering, you can test your hypothesis that the book description text can be used to identify these "hidden" micro-genres.
Earlier in this lesson, you were introduced to the idea of unsupervised learning. This machine learning task is especially useful when your data is not labeled.
Unsupervised learning using clustering
To test the hypothesis, you gather book description text for 800 romance books published in the current year.
Data exploration, cleaning and preprocessing
For this project, you believe capitalization and verb tense will not matter, and therefore you remove capitals and convert all verbs to the same tense using a Python library built for processing human language. You also remove punctuation and words you don’t think have useful meaning, like 'a' and 'the'. The machine learning community refers to these words as stop words.
Before you can train the model, you need to do some data preprocessing, called data vectorization, to convert text into numbers.
You transform this book description text into what is called a bag of wordsrepresentation shown in the following image so that it is understandable by machine learning models.
Now you are ready to train your model.
You pick a common cluster-finding model called k-means. In this model, you can change a model parameter, k, to be equal to how many clusters the model will try to find in your dataset.
Your data is unlabeled: you don't how many microgenres might exist. So you train your model multiple times using different values for k each time.
What does this even mean? In the following graphs, you can see examples of when k=2 and when k=3.
During the model evaluation phase, you plan on using a metric to find which value for k is most appropriate.
In machine learning, numerous statistical metrics or methods are available to evaluate a model. In this use case, the silhouette coefficient is a good choice. This metric describes how well your data was clustered by the model. To find the optimal number of clusters, you plot the silhouette coefficient as shown in the following image below. You find the optimal value is when k=19.
Optimum number (k=19) of clusters
Often, machine learning practitioners do a manual evaluation of the model's findings.
You find one cluster that contains a large collection of books you can categorize as “paranormal teen romance.” This trend is known in your industry, and therefore you feel somewhat confident in your machine learning approach. You don’t know if every cluster is going to be as cohesive as this, but you decide to use this model to see if you can find anything interesting about which to write an article.
As you inspect the different clusters found when k=19, you find a surprisingly large cluster of books. Here's an example from fictionalized cluster #7.
Clustered data
As you inspect the preceding table, you can see that most of these text snippets are indicating that the characters are in some kind of long-distance relationship. You see a few other self-consistent clusters and feel you now have enough useful data to begin writing an article on unexpected modern romance microgenres.
Bag of words: A technique used to extract features from the text. It counts how many times a word appears in a document (corpus), and then transforms that information into a dataset.
Data vectorization: A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model.
Silhouette coefficient: A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A score approaching 1 indicates successful identification of discrete non-overlapping clusters.
Stop words: A list of words removed by natural language processing tools when building your dataset. There is no single universal list of stop words used by all-natural language processing tools.
The How to Develop a Deep Learning Bag-of-Words Model for Sentiment Analysis (Text Classification) blog post provides an example using a bag of words–based approach pair with a deep learning model. https://machinelearningmastery.com/deep-learning-bag-of-words-model-sentiment-analysis
In this example, we used unsupervised learning using a clustering algorithm called k-means, which used unlabeled data.
In the k-means model used for this example, what does the value for "k" indicate? The number of clusters the model will try to find during training. Prior to training your model, you can set the value of "k" to equal the number of clusters you want the model to find.
True or false: An unsupervised learning approach is the only approach that can be used to solve problems of the kind described in this lesson (book micro-genres). True The number and size of possible clusters are unknown prior to model training, and thus you need to use an unsupervised approach. The number of clusters the model will try to find during training.
https://www.youtube.com/watch?v=VTmiITFTuEo
In the previous two examples, we used classical methods like linear models and k-means to solve machine learning tasks. In this example, we’ll use a more modern model type.
Note: This example uses a neural network. The algorithm for how a neural network works is beyond the scope of this lesson. However, there is still value in seeing how machine learning applies in this case.
Imagine you run a company that offers specialized on-site janitorial services. A client, an industrial chemical plant, requires a fast response for spills and other health hazards. You realize if you could automatically detect spills using the plant's surveillance system, you could mobilize your janitorial team faster.
Machine learning could be a valuable tool to solve this problem.
Detecting spills with machine learning
This task is a supervised classification task, as shown in the following image. As shown in the image above, your goal will be to predict if each image belongs to one of the following classes:
Image classification
Collecting
Using historical data, as well as safely staged spills, you quickly build a collection of images that contain both spills and non-spills in multiple lighting conditions and environments.
Exploring and cleaning
You go through all the photos to ensure the spill is clearly in the shot. There are Python tools and other techniques available to improve image quality, which you can use later if you determine a need to iterate.
Data vectorization (converting to numbers)
Many models require numerical data, so all your image data needs to be transformed into a numerical format. Python tools can help you do this automatically.
In the following image, you can see how each pixel in the image on the left can be represented in the image on the right by a number between 0 and 1, with 0 being completely black and 1 being completely white.
Chemical spill image
Numeric representation of chemical spill image
Split the data You split your image data into a training dataset and a test dataset.
Traditionally, solving this problem would require hand-engineering features on top of the underlying pixels (for example, locations of prominent edges and corners in the image), and then training a model on these features.
Today, deep neural networks are the most common tool used for solving this kind of problem. Many deep neural network models are structured to learn the features on top of the underlying pixels so you don’t have to learn them. You’ll have a chance to take a deeper look at this in the next lesson, so we’ll keep things high-level for now.
CNN (convolutional neural network) Neural networks are beyond the scope of this lesson, but you can think of them as a collection of very simple models connected together. These simple models are called neurons, and the connections between these models are trainable model parameters called weights.
Convolutional neural networks are a special type of neural network particularly good at processing images.
As you saw in the last example, there are many different statistical metrics you can use to evaluate your model. As you gain more experience in machine learning, you will learn how to research which metrics can help you evaluate your model most effectively.
Here's a list of common metrics: Accuracy | False positive rate | Precision |
---|---|---|
Confusion matrix | False negative rate | Recall |
F1 Score | Log Loss | ROC curve |
Negative predictive value | Specificity |
In cases such as this, accuracy might not be the best evaluation mechanism.
Why not? You realize the model will see the 'Does not contain spill' class almost all the time, so any model that just predicts “no spill” most of the time will seem pretty accurate.
What you really care about is an evaluation tool that rarely misses a real spill.
After doing some internet sleuthing, you realize this is a common problem and that Precision and Recall will be effective. You can think of precision as answering the question, "Of all predictions of a spill, how many were right?" and recall as answering the question, "Of all actual spills, how many did we detect?"
Manual evaluation plays an important role. You are unsure if your staged spills are sufficiently realistic compared to actual spills. To get a better sense how well your model performs with actual spills, you find additional examples from historical records. This allows you to confirm that your model is performing satisfactorily.
The model can be deployed on a system that enables you to run machine learning workloads such as AWS Panorama. Thankfully, most of the time, the results will be from the class 'Does not contain spill.' No spill detected
But, when the class 'Contains spill' is detected, a simple paging system could alert the team to respond. Spill detected
Convolutional neural networks(CNN) are a special type of neural network particularly good at processing images. Neural networks: a collection of very simple models connected together. These simple models are called neurons the connections between these models are trainable model parameters called weights.
In the Protecting people from hazardous areas through virtual boundaries with Computer Vision blog post, you can see a more detailed example of the deep learning process described in this lesson. https://aws.amazon.com/blogs/machine-learning/protecting-people-through-virtual-boundaries-computer-vision/
In the deep learning example, you can use more than one metric to evaluate the performance of your model.
A loss function... is a model hyperparameter. measures how close the model is towards its goal.
The model training algorithm iteratively updates a model’s parameters to minimize some loss function.
Supervised learning uses labeled data while training a model, and unsupervised learning uses unlabeled data while training a model.
Three main components of neural networks Input Layer: This layer receives data during training and when inference is performed after the model has been trained. Hidden Layer: This layer finds important features in the input data that have predictive power based on the labels provided during training. Output Layer: This layer generates the output or prediction of your model.
Modern computer vision Modern-day applications of computer vision use neural networks call convolutional neural networks or CNNs. In these neural networks, the hidden layers are used to extract different information about images. We call this process feature extraction. These models can be trained much faster on millions of images and generate a better prediction than earlier models.
It has many real-world applications. In this video, we cover examples of image classification, object detection, semantic segmentation, and activity recognition. Here's a brief summary of what you learn about each topic in the video:
Image classification is the most common application of computer vision in use today. Image classification can be used to answer questions like What's in this image? This type of task has applications in text detection or optical character recognition (OCR) and content moderation.
Object detection is closely related to image classification, but it allows users to gather more granular detail about an image. For example, rather than just knowing whether an object is present in an image, a user might want to know if there are multiple instances of the same object present in an image, or if objects from different classes appear in the same image.
Semantic segmentation is another common application of computer vision that takes a pixel-by-pixel approach. Instead of just identifying whether an object is present or not, it tries to identify down the pixel level which part of the image is part of the object.
Activity recognition is an application of computer vision that is based around videos rather than just images. Video has the added dimension of time and, therefore, models are able to detect changes that occur over time.
New Terms
Input Layer: The first layer in a neural network. This layer receives all data that passes through the neural network.
Hidden Layer: A layer that occurs between the output and input layers. Hidden layers are tailored to a specific task.
Output Layer: The last layer in a neural network. This layer is where the predictions are generated based on the information captured in the hidden layers.
Image classification is best used to classify an entire image. Optical character recognition or OCR is used to identify text in images. Object detection tells us whether objects are present in an image, rather than their location. Semantic segmentation is used when need to identify different parts of the pictures at a pixel level. Activity recognition is used when we need to identify changes which occur over time in a video.
![Uploading image.png…]()
https://www.youtube.com/watch?v=UVnjiIYLUsQ https://www.youtube.com/watch?v=EdihNjQVmyE https://www.youtube.com/watch?v=li-lJe3QWds https://www.youtube.com/watch?v=Jz20jSP5vdY https://www.youtube.com/watch?v=J3v2c08IxCE
In reinforcement learning (RL), an agent is trained to achieve a goal based on the feedback it receives as it interacts with an environment. It collects a number as a reward for each action it takes. Actions that help the agent achieve its goal are incentivized with higher numbers. Unhelpful actions result in a low reward or no reward.
With a learning objective of maximizing total cumulative reward, over time, the agent learns, through trial and error, to map gainful actions to situations. The better trained the agent, the more efficiently it chooses actions that accomplish its goal.
Reinforcement learning is used in a variety of fields to solve real-world problems. It’s particularly useful for addressing sequential problems with long-term goals. Let’s take a look at some examples.
RL is great at playing games:
Go (board game) was mastered by the AlphaGo Zero software.
Atari classic video games are commonly used as a learning tool for creating and testing RL software.
StarCraft II, the real-time strategy video game, was mastered by the AlphaStar software.
RL is used in video game level design:
Video game level design determines how complex each stage of a game is and directly affects how boring, frustrating, or fun it is to play that game.
Video game companies create an agent that plays the game over and over again to collect data that can be visualized on graphs.
This visual data gives designers a quick way to assess how easy or difficult it is for a player to make progress, which enables them to find that “just right” balance between boredom and frustration faster.
RL is used in wind energy optimization:
RL models can also be used to power robotics in physical devices.
When multiple turbines work together in a wind farm, the turbines in the front, which receive the wind first, can cause poor wind conditions for the turbines behind them. This is called wake turbulence and it reduces the amount of energy that is captured and converted into electrical power.
Wind energy organizations around the world use reinforcement learning to test solutions. Their models respond to changing wind conditions by changing the angle of the turbine blades. When the upstream turbines slow down it helps the downstream turbines capture more energy.
Other examples of real-world RL include:
Industrial robotics
Fraud detection
Stock trading
Autonomous driving
New Terms Agent: The piece of software you are training is called an agent. It makes decisions in an environment to reach a goal. Environment: The environment is the surrounding area with which the agent interacts. Reward: Feedback is given to an agent for each action it takes in a given state. This feedback is a numerical reward. Action: For every state, an agent needs to take an action toward achieving its goal.
Basic RL terms: Agent, environment, state, action, reward, and episode
Agent
The piece of software you are training is called an agent.
It makes decisions in an environment to reach a goal.
In AWS DeepRacer, the agent is the AWS DeepRacer car and its goal is to finish * laps around the track as fast as it can while, in some cases, avoiding obstacles.
Environment
The environment is the surrounding area within which our agent interacts.
For AWS DeepRacer, this is a track in our simulator or in real life.
State
The state is defined by the current position within the environment that is visible, or known, to an agent.
In AWS DeepRacer’s case, each state is an image captured by its camera.
The car’s initial state is the starting line of the track and its terminal state is when the car finishes a lap, bumps into an obstacle, or drives off the track.
Action
For every state, an agent needs to take an action toward achieving its goal.
An AWS DeepRacer car approaching a turn can choose to accelerate or brake and turn left, right, or go straight.
Reward
Feedback is given to an agent for each action it takes in a given state.
This feedback is a numerical reward.
A reward function is an incentive plan that assigns scores as rewards to different zones on the track.
Episode
An episode represents a period of trial and error when an agent makes decisions and gets feedback from its environment.
For AWS DeepRacer, an episode begins at the initial state, when the car leaves the starting position, and ends at the terminal state, when it finishes a lap, bumps into an obstacle, or drives off the track.
In a reinforcement learning model, an agent learns in an interactive real-time environment by trial and error using feedback from its own actions. Feedback is given in the form of rewards.
An algorithm is a set of instructions that tells a computer what to do. ML is special because it enables computers to learn without being explicitly programmed to do so.
The training algorithm defines your model’s learning objective, which is to maximize total cumulative reward. Different algorithms have different strategies for going about this.
A soft actor critic (SAC) embraces exploration and is data-efficient, but can lack stability.
A proximal policy optimization (PPO) is stable but data-hungry.
An action space is the set of all valid actions, or choices, available to an agent as it interacts with an environment.
Discrete action space represents all of an agent's possible actions for each state in a finite set of steering angle and throttle value combinations.
Continuous action space allows the agent to select an action from a range of values that you define for each sta te.
Hyperparameters are variables that control the performance of your agent during training. There is a variety of different categories with which to experiment. Change the values to increase or decrease the influence of different parts of your model.
For example, the learning rate is a hyperparameter that controls how many new experiences are counted in learning at each step. A higher learning rate results in faster training but may reduce the model’s quality.
The reward function's purpose is to encourage the agent to reach its goal. Figuring out how to reward which actions is one of your most important jobs.
"Exploration" is when an agent wanders to discover what actions lead to what feedback in the form of digital rewards. "Exploitation" is using experience to decide.
Avoid overfitting
Overfitting or overtraining is a really important concept in machine learning. With AWS DeepRacer, this can become an issue when a model is trained on a specific track for too long. A good model should be able to make decisions based on the features of the road, such as the sidelines and centerlines, and be able to drive on just about any track.
An overtrained model, on the other hand, learns to navigate using landmarks specific to an individual track. For example, the agent turns a certain direction when it sees uniquely shaped grass in the background or a specific angle the corner of the wall makes. The resulting model will run beautifully on that specific track, but perform badly on a different virtual track, or even on the same track in a physical environment due to slight variations in angles, textures, and lighting.
Generative AI is one of the biggest recent advancements in artificial intelligence because of its ability to create new things.
Until recently, the majority of machine learning applications were powered by discriminative models. A discriminative model aims to answer the question, "If I'm looking at some data, how can I best classify this data or predict a value?" For example, we could use discriminative models to detect if a camera was pointed at a cat.
As we train this model over a collection of images (some of which contain cats and others which do not), we expect the model to find patterns in images which help make this prediction.
A generative model aims to answer the question,"Have I seen data like this before?" In our image classification example, we might still use a generative model by framing the problem in terms of whether an image with the label "cat" is more similar to data you’ve seen before than an image with the label "no cat."
However, generative models can be used to support a second use case. The patterns learned in generative models can be used to create brand new examples of data which look similar to the data it seen before.
Generative AI Models
In this lesson, you will learn how to create three popular types of generative models: generative adversarial networks (GANs), general autoregressive models, and transformer-based models. Each of these is accessible through AWS DeepComposer to give you hands-on experience with using these techniques to generate new examples of music. Autoregressive models
Autoregressive convolutional neural networks (AR-CNNs) are used to study systems that evolve over time and assume that the likelihood of some data depends only on what has happened in the past. It’s a useful way of looking at many systems, from weather prediction to stock prediction. Generative adversarial networks (GANs)
Generative adversarial networks (GANs), are a machine learning model format that involves pitting two networks against each other to generate new content. The training algorithm swaps back and forth between training a generator network (responsible for producing new data) and a discriminator network (responsible for measuring how closely the generator network’s data represents the training dataset). Transformer-based models
Transformer-based models are most often used to study data with some sequential structure (such as the sequence of words in a sentence). Transformer-based methods are now a common modern tool for modeling natural language. https://www.youtube.com/watch?v=YziMYb9xA-g
New Terms
Generator: A neural network that learns to create new data resembling the source data on which it was trained.
Discriminator: A neural network trained to differentiate between real and synthetic data.
Generator loss: Measures how far the output data deviates from the real data present in the training dataset.
Discriminator loss: Evaluates how well the discriminator differentiates between real and fake data.
https://www.youtube.com/watch?v=NkxWTTXM9pI Our next popular generative model is the autoregressive convolutional neural network (AR-CNN). Autoregressive convolutional neural networks make iterative changes over time to create new data.
To better understand how the AR-CNN model works, let’s first discuss how music is represented so it is machine-readable. Image-based representation
Nearly all machine learning algorithms operate on data as numbers or sequences of numbers. In AWS DeepComposer, the input tracks are represented as a piano roll*. In each two-dimensional piano roll, time is on the horizontal axis and pitch* is on the vertical axis. You might notice this representation looks similar to an image.
The AR-CNN model uses a piano roll image to represent the audio files from the dataset. You can see an example in the following image where on top is a musical score and below is a piano roll image of that same score.
How the AR-CNN Model Works
When a note is either added or removed from your input track during inference, we call it an edit event. To train the AR-CNN model to predict when notes need to be added or removed from your input track (edit event), the model iteratively updates the input track to sounds more like the training dataset. During training, the model is also challenged to detect differences between an original piano roll and a newly modified piano roll. New Terms
Piano roll: A two-dimensional piano roll matrix that represents input tracks. Time is on the horizontal axis and pitch is on the vertical axis.
Edit event: When a note is either added or removed from your input track during inference.
Introduction to machine learning
What is Machine Learning?
Machine learning (ML) is a modern software development technique and a type of artificial intelligence (AI) that enables computers to solve problems by using examples of real-world data. It allows computers to automatically learn and improve from experience without being explicitly programmed to do so.
Model training algorithm | An iterative process fitting a generic model to specific data
Model inference algorithm | Process to use a trained model to solve a task
Machine learning is part of the broader field of artificial intelligence. This field is concerned with the capability of machines to perform activities using human-like intelligence. Within machine learning there are several different kinds of tasks or techniques:
Machine learning, or ML, is a modern software development technique that enables computers to solve problems by using examples of real-world data.
In supervised learning, every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values.
In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal.
In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data.
How does machine learning differ from traditional programming-based approaches?
Steps of machine learning
Step One: Define the Problem
How do You Start a Machine Learning Task?
Define a very specific task. Think back to the snow cone sales example. Now imagine that you own a frozen treats store and you sell snow cones along with many other products. You wonder, "‘How do I increase sales?" It's a valid question, but it's the opposite of a very specific task. The following examples demonstrate how a machine learning practitioner might attempt to answer that question. “Does adding a $1.00 charge for sprinkles on a hot fudge sundae increase the sales of hot fudge sundaes?” “Does adding a $0.50 charge for organic flavors in your snow cone increase the sales of snow cones?”
Identify the machine learning task we might use to solve this problem. This helps you better understand the data you need for a project
What is a Machine Learning Task? All model training algorithms, and the models themselves, take data as their input. Their outputs can be very different and are classified into a few different groups based on the task they are designed to solve. Often, we use the kind of data required to train a model as part of defining a machine learning task.
In this lesson, we will focus on two common machine learning tasks:
A task is supervised if you are using labeled data. We use the term labeled to refer to data that already contains the solutions, called labels. For example: Predicting the number of snow cones sold based on the temperatures is an example of supervised learning.
A task is considered to be unsupervised if you are using unlabeled data. This means you don't need to provide the model with any kind of label or solution while the model is being trained. For example: An image with various objects.
In supervised learning, there are two main identifiers you will see in machine learning:
In unsupervised learning, clustering is just one example. There are many other options, such as deep learning.
Classification tasks involve predicting some unknown categorical attribute about your data. Regression tasks involve predicting some unknown continuous attribute about your data. Clustering tasks involve exploring how your data might be grouped together.
Step Two: Build a Dataset
Working with data is perhaps the most overlooked—yet most important—step of the machine learning process.
Data collection Data collection can be as straightforward as running the appropriate SQL queries or as complicated as building custom web scraper applications to collect data for your project. You might even have to run a model over your data to generate needed labels. Here is the fundamental question: Does the data you've collected match the machine learning task and problem you have defined?
Data inspection The quality of your data will ultimately be the largest factor that affects how well you can expect your model to perform. As you inspect your data, look for: Outliers Missing or incomplete values Data that needs to be transformed or preprocessed so it's in the correct format to be used by your model
Summary statistics Models can assume how your data is structured. Now that you have some data in hand it is a good best practice to check that your data is in line with the underlying assumptions of your chosen machine learning model. With many statistical tools, you can calculate things like the mean, inner-quartile range (IQR), and standard deviation. These tools can give you insight into the scope, scale, and shape of the dataset.
Data visualization You can use data visualization to see outliers and trends in your data and to help stakeholders understand your data. Look at the following two graphs. In the first graph, some data seems to have clustered into different groups. In the second graph, some data points might be outliers.
Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset. Outliers are data points that are significantly different from others in the same sample.
Step Three: Model Training
The first step in model training is to randomly split the dataset. This allows you to keep some data hidden during training, so that data can be used to evaluate your model before you put it into production. Specifically, you do this to test against the bias-variance trade-off. Splitting your dataset gives you two sets of data:
Putting it All Together The end-to-end training process is
You continue to cycle through these steps until you reach a predefined stop condition. This might be based on a training time, the number of training cycles, or an even more intelligent or application-aware mechanism.
Advice From the Experts Remember the following advice when training your model.
Pragmatic problem solving with machine learning is rarely an exact science, and you might have assumptions about your data or problem which turn out to be false. Don’t get discouraged. Instead, foster a habit of trying new things, measuring success, and comparing results across iterations.
Extended Learning
Linear models
One of the most common models covered in introductory coursework, linear models simply describe the relationship between a set of input numbers and a set of output numbers through a linear function (think of y = mx + b or a line on a x vs y chart).
Classification tasks often use a strongly related logistic model, which adds an additional transformation mapping the output of the linear function to the range [0, 1], interpreted as “probability of being in the target class.” Linear models are fast to train and give you a great baseline against which to compare more complex models. A lot of media buzz is given to more complex models, but for most new problems, consider starting with a simple model.
Tree-based models
Tree-based models are probably the second most common model type covered in introductory coursework. They learn to categorize or regress by building an extremely large structure of nested if/else blocks, splitting the world into different regions at each if/else block. Training determines exactly where these splits happen and what value is assigned at each leaf region.
For example, if you’re trying to determine if a light sensor is in sunlight or shadow, you might train tree of depth 1 with the final learned configuration being something like if (sensor_value > 0.698), then return 1; else return 0;. The tree-based model XGBoost is commonly used as an off-the-shelf implementation for this kind of model and includes enhancements beyond what is discussed here. Try tree-based models to quickly get a baseline before moving on to more complex models.
Deep learning models
Extremely popular and powerful, deep learning is a modern approach based around a conceptual model of how the human brain functions. The model (also called a neural network) is composed of collections of neurons (very simple computational units) connected together by weights (mathematical representations of how much information to allow to flow from one neuron to the next). The process of training involves finding values for each weight.
Various neural network structures have been determined for modeling different kinds of problems or processing different kinds of data.
A short (but not complete!) list of noteworthy examples includes: FFNN: The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer. CNN: Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images. RNN/LSTM: Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data. Transformer: A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.
Machine Learning Using Python Libraries
Terminology Hyperparameters are settings on the model which are not changed during training but can affect how quickly or how reliably the model trains, such as the number of clusters the model should identify. A loss function is used to codify the model’s distance from this goal Training dataset: The data on which the model will be trained. Most of your data will be here. Test dataset: The data withheld from the model during training, which is used to test how well your model will generalize to new data. Model parameters are settings or configurations the training algorithm can update to change how the model behaves.
Additional reading The Wikipedia entry on the bias-variance trade-off can help you understand more about this common machine learning concept.
Step Four: Model Evaluation
After you have collected your data and trained a model, you can start to evaluate how well your model is performing. The metrics used for evaluation are likely to be very specific to the problem you have defined. As you grow in your understanding of machine learning, you will be able to explore a wide variety of metrics that can enable you to evaluate effectively.
Using Model Accuracy Model accuracy is a fairly common evaluation metric. Accuracy is the fraction of predictions a model gets right. Imagine that you built a model to identify a flower as one of two common species based on measurable details like petal length. You want to know how often your model predicts the correct species. This would require you to look at your model's accuracy.
Using Log Loss Log loss seeks to calculate how uncertain your model is about the predictions it is generating. In this context, uncertainty refers to how likely a model thinks the predictions being generated are to be correct.
For example, let's say you're trying to predict how likely a customer is to buy either a jacket or t-shirt. Log loss could be used to understand your model's uncertainty about a given prediction. In a single instance, your model could predict with 5% certainty that a customer is going to buy a t-shirt. In another instance, your model could predict with 80% certainty that a customer is going to buy a t-shirt. Log loss enables you to measure how strongly the model believes that its prediction is accurate. In both cases, the model predicts that a customer will buy a t-shirt, but the model's certainty about that prediction can change.
Every step we have gone through is highly iterative and can be changed or re-scoped during the course of a project. At each step, you might find that you need to go back and reevaluate some assumptions you had in previous steps. Don't worry! This ambiguity is normal.
Log loss seeks to calculate how uncertain your model is about the predictions it is generating. Model Accuracy is the fraction of predictions a model gets right.
The tools used for model evaluation are often tailored to a specific use case, so it's difficult to generalize rules for choosing them. The following articles provide use cases and examples of specific metrics in use.
This lesson has covered linear regression in detail, explaining how you can envision minimizing loss, how the model can be used in various scenarios, and the importance of data. What are some methods or tools that could be useful to consider when evaluating a linear regression output? Can you provide an example of a situation in which you would apply that method or tool? Search on - Methods or tools that could be useful to consider when evaluating a linear regression output
Provided Answer: There are many different tools that can be used to evaluate a linear regression model. Here are a few examples:
Step Five: Model Inference
Once you have trained your model, have evaluated its effectiveness, and are satisfied with the results, you're ready to generate predictions on real-world problems using unseen data in the field. In machine learning, this process is often called inference.
Even after you deploy your model, you're always monitoring to make sure your model is producing the kinds of results that you expect. Tthere may be times where you reinvestigate the data, modify some of the parameters in your model training algorithm, or even change the model type used for training.
Model inference involves:
Examples