Open coderschoolreview opened 5 years ago
Overall : Good, all answers are correct.
All answer correct. Awesome
Didn't work because "data" folder doesnt exist. Could fix by creating one.
Thanks teacher
For ref, check out Mai's notebook
classification_report
from sklearn.metrics
The goal of this assignment was to introduce you following concepts in Machine Learning:
missingno
to visualizepandas.get_dummies
To know "which is the best combination of parameter":
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
# Using gridsearchcv, random forest model and this param grid to find the best combination of parameters
# Hint: example
# https://stackoverflow.com/questions/30102973/how-to-get-best-estimator-on-gridsearchcv-random-forest-classifier-scikit
param_grid = {
'bootstrap': [True],
'max_depth': [80, 90, 100, 110],
'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [100, 200, 300, 1000]
}
from sklearn.model_selection import GridSearchCV
gcv = GridSearchCV(RandomForestClassifier(),param_grid=param_grid)
gcv.fit(X,y)
gcv.best_params_
Installing packages missingno
(or any arbitrary package) on Win10:
Anaconda Prompt
conda install -c conda-forge missingno
Filtering series:
Your code:train_copy.isnull()
Could write like this to get column with null value only:
ncols = train_copy.isnull().sum
ncols[ncols!=0]
Check if whole data frame have any null value:
train.isnull().any().any()
For evaluation function, should print
instead of return
so when you loop through list of model and evaluate them, the result for each iteration printed to output.
The goal of this assignment was to introduce you to following concepts:
You learn how to use PCA for dimension reduction, KMeans, and Hierarchical Clustering. Also you learn to visualize the result of both tenichque.
To create a range of integer, use range(low,high,step)
k_values = range(1,20)
Almost correct, just need to add color of each cluster to plot by adding argument
c=y_cluster
toplt.scatter
function
pca2 = PCA(n_components=2) pca2.fit(X) projected = pca2.fit_transform(X)
plt.figure(figsize=(25, 10))
plt.scatter(projected[:, 0], projected[:, 1], edgecolor='none', alpha=0.5, cmap=plt.cm.get_cmap('viridis', 10) c = y_cluster )
The goal of this assignment was to introduce you to following concepts:
You learn how to use PCA for dimension reduction, KMeans, and Hierarchical Clustering. Also you learn to visualize the result of both tenichque.
Almost correct, just need to add color of each cluster to plot by adding argument
c=y_cluster
toplt.scatter
function
pca2 = PCA(n_components=2)
pca2.fit(X)
projected = pca2.fit_transform(X)
plt.figure(figsize=(25, 10))
plt.scatter(projected[:, 0], projected[:, 1],
edgecolor='none', alpha=0.5,
cmap=plt.cm.get_cmap('viridis', 10)
c = y_cluster
)
How many people have the job title of "Lawyer" ?
Could've been done with