import pandas as pd

save filepath to variable for easier access

melbourne_file_path = '/content/melb_data.csv'

read the data and store data in dataframe titled as melbourne

melbourne_data = pd.read_csv(melbourne_file_path)

print summary of the data in melbourne data

melbourne_data.describe()

Interpreting data description

melbourne_data.columns

Since our table has some missing values. we'll drop the house column

melbourne_data = melbourne_data.dropna(axis=0) #dropna = drops not avaliable columns

melbourne_data.columns

Heres how you can get the necessary columns

melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']

X = melbourne_data[melbourne_features]

X.describe()

X.head()

Building models with scikit-learn. modeling types of data.

Define - What type will it be? decision tree? etc? are there some parameters?

fit - capture patterns from provided data this the heart of modeling

predict - your predicting something

evaluate - how accurate the model predictions are.

melbourne_feature_y = ['Price']

y = melbourne_data[melbourne_feature_y]

from sklearn.tree import DecisionTreeRegressor

defining models specify number for random_state to ensure same results each run

melbourne_model = DecisionTreeRegressor(random_state=1)

fit model

melbourne_model.fit(X, y)

print("Making predictions for the following 5 houses:") print(X.head()) print("The predictions are") print(melbourne_model.predict(X.head()))

Code you have previously used to load data

import pandas as pd

Path of the file to read

iowa_file_path = '../input/home-data-for-ml-course/train.csv'

home_data = pd.read_csv(iowa_file_path)

Set up code checking

from learntools.core import binder binder.bind(globals()) from learntools.machine_learning.ex3 import *

print("Setup Complete")

Exercises

Step 1: Specify Prediction Target

Select the target variable, which corresponds to the sales price. Save this to a new variable called y. You'll need to print a list of the columns to find the name of the column you need.

print the list of columns in the dataset to find the name of the prediction target

home_data.columns

y = home_data.SalePrice

Step 2: Create X

Now you will create a DataFrame called X holding the predictive features.

Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X.

You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes):

LotArea
YearBuilt
1stFlrSF
2ndFlrSF
FullBath
BedroomAbvGr
TotRmsAbvGrd

After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.

Create the list of features below

feature_names = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']

Select data corresponding to features in feature_names

X = home_data[feature_names]

Review Data

Before building a model, take a quick look at X to verify it looks sensible

Review data

print description or statistics from X

print(_)

print the top few lines

print(_)

Step 3: Specify and Fit Model

Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.

Then fit the model you just created using the data in X and y that you saved above.

from sklearn.tree import DecisionTreeRegressor

specify the model.

For model reproducibility, set a numeric value for random_state when specifying the model

iowa_model = DecisionTreeRegressor(random_state=1)

Fit the model

iowa_model.fit(X,y)

Step 4: Make Predictions

Make predictions with the model's predict command using X as the data. Save the results to a variable called predictions.

print(X.head()) predictions = iowa_model.predict(X) print(predictions)

Luhen1 / Kaggle-1

Predicting house prices #1

save filepath to variable for easier access

read the data and store data in dataframe titled as melbourne

print summary of the data in melbourne data

Interpreting data description

Since our table has some missing values. we'll drop the house column

Heres how you can get the necessary columns

Building models with scikit-learn. modeling types of data.

Define - What type will it be? decision tree? etc? are there some parameters?

fit - capture patterns from provided data this the heart of modeling

predict - your predicting something

evaluate - how accurate the model predictions are.

defining models specify number for random_state to ensure same results each run

fit model

Code you have previously used to load data

Path of the file to read

Set up code checking

Exercises

Step 1: Specify Prediction Target

print the list of columns in the dataset to find the name of the prediction target

Step 2: Create X

Create the list of features below

Select data corresponding to features in feature_names

Review Data

Review data

print description or statistics from X

print(_)

print the top few lines

print(_)

Step 3: Specify and Fit Model

specify the model.

For model reproducibility, set a numeric value for random_state when specifying the model

Fit the model

Step 4: Make Predictions