Open Luhen1 opened 3 years ago
import pandas as pd
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
home_data = pd.read_csv(iowa_file_path)
from learntools.core import binder binder.bind(globals()) from learntools.machine_learning.ex3 import *
print("Setup Complete")
Select the target variable, which corresponds to the sales price. Save this to a new variable called y
. You'll need to print a list of the columns to find the name of the column you need.
home_data.columns
y = home_data.SalePrice
Now you will create a DataFrame called X
holding the predictive features.
Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X
.
You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes):
After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.
feature_names = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[feature_names]
Before building a model, take a quick look at X to verify it looks sensible
Create a DecisionTreeRegressor
and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.
Then fit the model you just created using the data in X
and y
that you saved above.
from sklearn.tree import DecisionTreeRegressor
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(X,y)
Make predictions with the model's predict
command using X
as the data. Save the results to a variable called predictions
.
print(X.head()) predictions = iowa_model.predict(X) print(predictions)
import pandas as pd
save filepath to variable for easier access
melbourne_file_path = '/content/melb_data.csv'
read the data and store data in dataframe titled as melbourne
melbourne_data = pd.read_csv(melbourne_file_path)
print summary of the data in melbourne data
melbourne_data.describe()
Interpreting data description
melbourne_data.columns
Since our table has some missing values. we'll drop the house column
melbourne_data = melbourne_data.dropna(axis=0) #dropna = drops not avaliable columns
melbourne_data.columns
Heres how you can get the necessary columns
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]
X.describe()
X.head()
Building models with scikit-learn. modeling types of data.
Define - What type will it be? decision tree? etc? are there some parameters?
fit - capture patterns from provided data this the heart of modeling
predict - your predicting something
evaluate - how accurate the model predictions are.
melbourne_feature_y = ['Price']
y = melbourne_data[melbourne_feature_y]
from sklearn.tree import DecisionTreeRegressor
defining models specify number for random_state to ensure same results each run
melbourne_model = DecisionTreeRegressor(random_state=1)
fit model
melbourne_model.fit(X, y)
print("Making predictions for the following 5 houses:") print(X.head()) print("The predictions are") print(melbourne_model.predict(X.head()))