-->
Everything related to course content in video lecture format will be uploaded here on youtube.
Introduction to Python ka chilla and Data Science
Python k chilla ki pehli class main hame ye cheezen dekhi hyn:
- What is Data Science?
- Daily Life Examples
- Installation of Python and VScode
- Writing your first line of code
- Data Types and operators
- Different operation in CMD
- What is an IDE (Integrated Development Environment)?
- The use of chatGPT was also discussed in this lecture which can be seen in the following video here
The video can be seen by clicking on the picture below.
Installation of Python and VScode
Python ko seekhnay se pehlay usay install karna zaroori hy or yahan ham 2 software install karen gay:\
Urdu main ham ne in software ko install krne ka tareeqa bhi bta dea hy, neechay image per click kar k video dekh saktay hyn.
Write your first line of code with us
Is video main hame dekhen gay k ap ne installation k baad first line of code kaisay likhni hy jaisay ap ne pichli Lecture main dekha tha is main ham aik file k through bhi codes run karen gay.
Lecture ko Urdu video main sunnay k liay is imagae per click karen
Python ka chilla 2023 k doosray din main ham seekhen gay k IDE yani k VScode ko kaisay use karna hy to learn and use python in an efficient manner.
Day-1 main ap ne VScode ko install kia tha ab dekhtay hyn k vscode ko use kaisay karna hy?
- Is video main ham ne vscode ki top extensions install karna dekhni hyn
- Python ko VScode main run karna
- File types or extensions kon kon si hti hyn woh dekhen gay
- Python ki extension
.py
or.ipynb
kia hti hyn ye dekhen gay- Questions kaisay poochnay hyn woh dekhen gay. Agar ap ka koi question hy tu ap yahan per click kar k pooch len
- Ap ko isi video main future strategy bhi milay ge k agay walay din kaisay practice karni hy
Is video per click karen and watch this whole session:
You can learn Basics of Python Programming from these two Videos:
Ye day-3 ka lecture hy is ko finish karen or at least 3 martaba practice karen takay ye concepts ap ko clear ho jayen, phir hi agar maza aana seekhnay ka, warna issue hi rehnay ap ko (Agar ap yahan se kuch miss karen gay then ap ko maslay hnay walay hyn agay): Python Programming (Python-101) by clicking on this:\
This was the live zoom session where we discused variables, input_function and much more. watch the video by clicking following figure:\
Is lecture main ham dekhen gay k:
.ipynb
extension wali kia hti hyn? video is here.md
files for github before moving ahead.\
Here is the dataset to work on download here
In this lecture you will see how pandas library can be used to import the dataset and run basic functions on that dataset
In this lecture you will see how pandas library can be used to import the dataset and run basic functions on that dataset
Is lecture main ham seekhen gay k pandas
library ko use karne ki tips and tricks kon konsi hyn.\
Ye lecture long hyn tu ap se request hy is ko poora dekh k sath sath pactice b karen.\
Lecture dekhnay k liay neechay is picture per click karen
Is lecture main ham pandas main or b details se practice karen gay.\
Is lecture main ham cheat sheets dekhnay walay hyn jo bht important hun ge data wrangling karne k liay. Cheat sheets yahan se download kar len Download all Cheat Sheets here
Plots or graphs bnanay k liay ap ko python se behtar koi language nahi milni or isi baat ko btanay k liay aaj k lectures hyn. In main ap seekhen gay k python ki different libraries like matplotlib
and seaborn
ko use kar kaisay plot bnaye jatay hyn
Is video main ap seekhen gay plotting hti kia hy or q zaroori hy:\
Is video main aap dekhen gay k python main coding kar k plots kaisay bna saktay hyn:\
Watch the following playlist to learn basic statistics for Data Science.\ Click here to watch ABC of Statistics
Today we will learn Data Wrangling in python:
Data wrangling
, also known as data munging
, is the process of cleaning, transforming, and organizing data in a way that makes it more suitable for analysis. It is a crucial step in the data science process as real-world data is often messy and inconsistent.
The general steps to do Data Wrnagling
in python are as follows:
Steps to perform data wrangling on the Titanic dataset in Python using pandas library: The steps of data wrangling in Python typically include:
- Importing necessary libraries such as Pandas, NumPy, and Matplotlib
- Loading the data into a Pandas DataFrame
- Assessing the data for missing values, outliers, and inconsistencies
- Cleaning the data by filling in missing values, removing outliers, and correcting errors
- Organizing the data by creating new columns, renaming columns, sorting, and filtering the data
- Storing the cleaned data in a format that can be used for future analysis, such as a CSV or Excel file
- Exploring the data by creating visualizations and using descriptive statistics
- Creating a pivot table to summarize the data
- Checking for and handling duplicate rows
- Encoding categorical variables
- Removing unnecessary columns or rows
- Merging or joining multiple datasets
- Handling missing or null values
- Reshaping the data
- Formatting the data
- Normalizing or scaling the data
- Creating new features from existing data
- Validating data integrity
- Saving the final data for future use
- Documenting the data wrangling process for reproducibility
Please note that the steps may vary depending on the data, the requirements, and the goals of the analysis. It's worth noting that these are general steps and the specific steps you take will depend on the dataset you are working with and the analysis you plan to perform.
Here is an example of how to perform data wrangling on the titanic
dataset in Python using the pandas library:
# Import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load the Titanic dataset into a pandas DataFrame
titanic = sns.load_dataset('titanic')
# View the first few rows of the dataset
titanic.head()
# View the column names and data types
titanic.info()
# Check for missing values
print(data.isnull().sum())
# Handle missing values
# Option 1: Drop rows with missing values
titanic.dropna(inplace=True)
# Option 2: Impute missing values
titanic['Age'].fillna(titanic['Age'].mean(), inplace=True)
# Check for outliers and remove or transform them as necessary
sns.boxplot(x=titanic['age'])
# Transform outliers
titanic['Age'] = np.log(titanic['Age'])
# Feature engineering
titanic['family_size'] = titanic['sibsp'] + titanic['parch'] + 1
titanic['is_alone'] = 1 # initialize to yes/1 is alone
titanic['is_alone'].loc[titanic['family_size'] > 1] = 0 # now update to no/0 if family size is greater than 1
# Group and aggregate data
data = titanic.groupby('Pclass').mean()
# Save the cleaned dataset
titanic.to_csv('titanic_cleaned.csv', index=False)
This is just one example of how to perform data wrangling on the Titanic dataset, but there are many other ways you can handle missing values, outliers, and feature engineering. The important thing is to understand the data, and to make decisions based on the context of the problem you're trying to solve.
Another way of treating the data is as follows:
# Import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load the Titanic dataset
data = pd.read_csv('titanic.csv')
# View the first few rows of the dataset
print(data.head())
# Check for missing values
print(data.isnull().sum())
# Handle missing values
data['age'].fillna(data['age'].median(), inplace=True)
data['embarked'].fillna(data['embarked'].mode()[0], inplace=True)
# Check for outliers
# Option 1: Remove outliers
q1 = data["fare"].quantile(0.25)
q3 = data["fare"].quantile(0.75)
iqr = q3-q1
fence_low = q1-1.5*iqr
fence_high = q3+1.5*iqr
data = data[(data["fare"] > fence_low) & (data["fare"] < fence_high)]
# Option 2: Transform outliers
data['fare'] = data['fare'].apply(lambda x: x if x < 100 else 100)
# Feature engineering
#if the titles are given in the dataset
data['Title'] = data.Name.str.extract(' ([A-Za-z]+)\.', expand=False)
data['Title'] = data['Title'].replace(['Lady', 'Countess','Capt', 'Col','Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')
data['Title'] = data['Title'].replace('Mlle', 'Miss')
data['Title'] = data['Title'].replace('Ms', 'Miss')
data['Title'] = data['Title'].replace('Mme', 'Mrs')
data = data.drop(['Name'], axis=1)
data = pd.get_dummies(data, columns = ["Title"])
# Group and aggregate data
data = data.groupby(['Pclass', 'Sex']).mean()
# Save the cleaned dataset
data.to_csv('titanic_cleaned.csv', index=True)
In this example I have used IQR method to check for outliers, and I have used some feature engineering techniques, like extracting title from the name and creating dummies variables and also I have grouped the data by Pclass and Sex and taken the mean of the data.
It's important to note that the steps you take during data wrangling will vary depending on the dataset and the specific analysis you plan to perform. The examples above should give you an idea of the types of tasks that are typically involved in data wrangling and how to perform them using the pandas library.
In this lecture we will learn what is machine learning and how we can implement that in our everyday life, projects and science themes.
In this lecture we will learn what is machine learning with explained and detailed example
Here is the video:
In this lecture we will learn how to use linear regression model in Machine learning, what is it and how we can implement that in real life?
Here is the video:
In this lecture we will learn how to use classification model in Machine learning, what is it and how we can implement that in real life?
Here is the video:
In this lecture we will learn how to select a best model in Machine learning, what is it and how we can implement that in real life?
Here is the video:
Send me your assignment if you enrolled in the course:
Here is the code mentioned in the video:
# Import the necessary libraries
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("titanic")
X = df[['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare']]
y = df['survived']
X = pd.get_dummies(X, columns=['sex'])
X.age.fillna(value = X['age'].mean(), inplace=True)
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
models = [LogisticRegression(), SVC(), DecisionTreeClassifier(), RandomForestClassifier(), KNeighborsClassifier()]
model_names = ['Logistic Regression', 'SVM', 'Decision Tree', 'Random Forest', 'KNN']
models_scores = []
for model, model_name in zip(models, model_names):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
models_scores.append([model_name,accuracy])
sorted_models = sorted(models_scores, key=lambda x: x[1], reverse=True)
for model in sorted_models:
print("Accuracy Score: ",f'{model[0]} : {model[1]:.2f}')
# Accuracy Score: Random Forest : 0.81
# Accuracy Score: Decision Tree : 0.79
# Accuracy Score: KNN : 0.76
# Accuracy Score: Logistic Regression : 0.75
# Accuracy Score: SVM : 0.74
models = [LogisticRegression(), SVC(), DecisionTreeClassifier(), RandomForestClassifier(), KNeighborsClassifier()]
model_names = ['Logistic Regression', 'SVM', 'Decision Tree', 'Random Forest', 'KNN']
models_scores = []
for model, model_name in zip(models, model_names):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Precision = precision_score(y_test, y_pred)
models_scores.append([model_name,Precision])
sorted_models = sorted(models_scores, key=lambda x: x[1], reverse=True)
for model in sorted_models:
print("Precision Score: ", f'{model[0]} : {model[1]:.2f}')
# Precision Score: Random Forest : 0.80
# Precision Score: Decision Tree : 0.78
# Precision Score: KNN : 0.75
# Precision Score: Logistic Regression : 0.74
# Precision Score: SVM : 0.73
models = [LogisticRegression(), SVC(), DecisionTreeClassifier(), RandomForestClassifier(), KNeighborsClassifier()]
model_names = ['Logistic Regression', 'SVM', 'Decision Tree', 'Random Forest', 'KNN']
models_scores = []
for model, model_name in zip(models, model_names):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Recall = recall_score(y_test, y_pred)
models_scores.append([model_name,Recall])
sorted_models = sorted(models_scores, key=lambda x: x[1], reverse=True)
for model in sorted_models:
print("Recall Score: ",f'{model[0]} : {model[1]:.2f}')
# Recall Score: Random Forest : 0.74
# Recall Score: Decision Tree : 0.72
# Recall Score: KNN : 0.68
# Recall Score: Logistic Regression : 0.67
# Recall Score: SVM : 0.65
models = [LogisticRegression(), SVC(), DecisionTreeClassifier(), RandomForestClassifier(), KNeighborsClassifier()]
model_names = ['Logistic Regression', 'SVM', 'Decision Tree', 'Random Forest', 'KNN']
models_scores = []
for model, model_name in zip(models, model_names):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
F1 = f1_score(y_test, y_pred)
models_scores.append([model_name,F1])
sorted_models = sorted(models_scores, key=lambda x: x[1], reverse=True)
for model in sorted_models:
print("F1 Score: ",f'{model[0]} : {model[1]:.2f}')
# F1 Score: Random Forest : 0.77
# F1 Score: Decision Tree : 0.75
# F1 Score: KNN : 0.71
# F1 Score: Logistic Regression : 0.70
# F1 Score: SVM : 0.68
In this lecture we will learn how to select a best parameters in a model using gridsearch CV in scikit-learn
Here is the video:
Here is the code mentioned in the video:
# Decision Tree Classifier and use best parameters
import pandas as pd
import seaborn as sns
df = sns.load_dataset("titanic")
X = df[['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare']]
y = df['survived']
X = pd.get_dummies(X, columns=['sex'])
X.age.fillna(value = X['age'].mean(), inplace=True)
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
import numpy as np
#create a model
model = DecisionTreeClassifier()
# define parameter grid
param_grid = {'max_depth': [3, 5, 7, None], 'min_samples_split': [2, 3, 4]}
#object grid search cv (Creating the model)
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='precision')
#traing the model
grid_search.fit(X,y)
# print the best parameters
print("Best Parameters: ", grid_search.best_params_)
print("Best Score: ", grid_search.best_score_)
# Best Parameters: {'max_depth': 3, 'min_samples_split': 2}
# Best Score: 0.775
In your assignments:
Please write a code where you can select the best model based on grid search cv!
Sab se pehlay is lecture main ap o K-means clustering kaisay python main apply ki jati hy woh sikhaya jaye ga:
Ab se agalay 2 din ham machine learning k basic concepts dekhnay walay hyn, jo terminologies machine learning main use hti hyn woh seekhen gay, ye lectures bht important hyn is liay inhen skip na kijeay ga.
Agar ap yahan tak seekh ayen hyn tu yaqeen manen ap ny in concepts ko already bht had tak dekha hy, ab clear ho jayen gay or.
Es video may hum Machine Learning ka introduction dekhyn gyn k wo hoti kya hay or is ML playlist may hum ainda kya chzyn dekhny walay hayn.
Cross validation ka nam hum bht zeada suntay hayn ML ki dunya may or is video may hum bht h desi example k sath dekhyn gyn k Cross validation (CV) kya hota hay or us ki mukhtalif types i.e., 4 fold CV, 10 fold CV, etc.
Confusion matrix hmyn bht h zeada confuse rakhta hay is liay isko asan bnanay k liay ye video ap k liay desi examples k sath mojood hay jis may hum nay dekha hay k confusion matrix ki zrurat kiun hay or hum kis trhn bnatay hayn jis may True Positive, True Negatives, False Positives or, False Negatives ko smjha hay.
Jab hum koi b model bnatay hayn to us ko chk krty hayn k wo kitna acha kam kr raha hay. Sirf accuracy say kam nai chlta hr baar is liay ye dekhna prta hay k us k andr model kitnay positives (sensitivity) or kitnay negatives ( specificity) sai say model nay btaey hayn.
Agar ap nay 2 model bnaey hayn jis may say 1 model bht acha fit hua hay or aek model bht achi prediction kr raha hay to apko un dono ko kesay istemal krna chaheay is ko smjhnay k liay BIAS -VARIANCE tradeoff ka concept smjhna zruri hay.
Entropy Ka lfzi mtlb to ye hay k ap k data may Randomness/ disorder Kitna hay lekin machine learning ki dunya may is ko hum kis trhn istemal krtay hayn is video may achay say bht h sada or asan treqay say smjhaya hay ta k ap ainda jab b Entropy istemal kryn ML may to apko idea ho k ap isay kiun or kesay istemal kr skty hayn.
Bht h zeada asan or aam ML Ka model hay linear regression model pr is may istemal honay walay concepts sbb KO smjh nai aatay. Is liay is video may ye btaya geya hay k linear regression hota kya hay, residual kisay kehtay hayn.
Hum square kiun krty hayn difference KO or least squared residuals kya hotay hayn or kiun hum usay regression may dekhty hayn isi trhn agar hum higher dimensions may jaeyn ( aam lfzon may agar humaray paas 1 say zeada independent variables hon to kis trhn us may regression Ka model lgta hay us k liay ye video dekhna ap k liay bht zruri hay ( multiple regression)
Aam dunya may sbb Kuch linear nai hota isi liay hum HR Baar linear regression k models istemal nai krty is liay jab b hmaray paas dependent variable may categorical/Boolean data ho to hum istemal krtay hayn Logistic regression. Is video may hum nay Dekha k logistics regression kis trhn different hay linear regression say, s curve Ka kya concept hay or kya higher dimensional logistic regression Hoti hay.
ROC ( Receive operating characteristic) and AUC ( area under the curve) dono mil k hmyn btatay hayn k model kis trhn perform kr Raha hay. ROC aek probability curve hay or AUC us may ye btata hay k measure of separability kitni hay model ki ye dono aek sath accuracy say zeada Acha model ko evaluate krty hayn jab class imbalanced Hoti hay, is video may ye chz asan lfzon may discuss ki hui hay or ye b btaya geya hay k hum in KO kesay bna skty hayn or kesay interpretation kr skty hayn.
Logistic regression may s-curve ko fit krnay k liay hum nay Jo method istemal Kia hay ( maximum likelihood) Ka wo in detail kesay lgta hay or is may hum least squared residuals wala method kiun istemal nai krty.
Or agar hum model ko chk kr rhy hayn k model Kitna accurate and reliable hay to us liay hum R-squared logistic regression may kis trhn calculate krty hayn or interpret krty hayn.
jab b hmaray model may overfitting/underfitting Ka issue ata hay to hum Realizations techniques use krty hayn or is may data or model ki noyiat Dekh k hum ye faisla krty hayn k hum kis technique pay focus kryn gyn. Jab data may bht saray usefull variables hon to hum mostly L2 use krty hayn.
or jab useless variables zeada hon to hum L1 use krty hayn.
or jab hum drmean may hon to phir hybrid technique Elastic net istemal krtay hayn.
Principal component analysis (PCA) hum zeada tr feature selection/ dimension reduction k liay istemal krtay hayn. Is video may hum nay ye Dekha hay in detail k ML may exactly kb or kesay PCA istemal KR k apna Kam asaan Kia ja skta hay or is may mojooe eigen vectors Ka concept b asan lfzon may btaya geya hay.
Regression techniques k baad hum nay ye Dekha k Clustering kis trhn ki jati hay or is Ka mtlb kya hay or zrurat kiun Hoti hay. Aek bht h asan or famous techniques K-MEANS CLUSTERING ki hum is may kesay istemal krtay hayn or is may K , MEANS dono Ka mtlb kya hay or hum kesay decide krty hayn k K kya lena chaheay kis say hmara Kam asaan ho sky.
Is lecture main ham ne seekha k kis trah ham apna trained ML model save kartay hyn, click the image below to watch the video lecture:
Download any dataset on covid and submit the EDA on telegram if you want to have a feedback.
You will learn these in this lecture:
After this lecture you have send the video presentation as mentioned in the lecture.
Is video main ap neural network or is ki types dekhnay walay hyn.
This video will give you the basic concepts of computer vision in python.
This video will give you the basic concepts of activation function in tensorflow.
This video will give you the basic concepts of activation function in tensorflow.
This video will give you the basic concepts of activation function in tensorflow.
In thie lecture we used fashion mnist dataset to do some machine learning and deep learning tasks in Python with tensorflow.
60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images. This dataset can be used as a drop-in replacement for MNIST. The class labels are: Label Description
Call back function ko use kar k ham andaza laga saktay hyn k kitnay epochs use karnay chaheay based on the accuracy or loss, is lecture main ap ko sab clear hnay wala hy.
In this lecture we will learn about the types of neural networks.
Is lecture main ap streamlit ko dekhen gay. jo aik library hy jis se ap behtareen qisam ki webapps bna saktay hyn asani se.
Ye din several parts main hy is liay bear with me and learn alot today.
Intro to Streamlit
Streamlit with titanic dataset
Streamlit with Plotly
Animated plots with Streamlit & Plotly
Streamlit webb app for EDA analysis
Streamlit k Jugaar
Machine Leaning Web-application in python with streamlit
Deploy a streamlit data science app online
Add video & audio to your streamlit data science webapp in python
Add code to streamlit webapp
Make Interactive Dashboard with Explainer Dashboard
Embedding code snippets in streamlit webapp
Streamlit App development with Python project based
Data Science Web app development via Streamlit in Python (Project based)
Is lecture main time series analysis ko dekhen gay. Ye lecture two parts main hy.
In this lecture we will learn about the basics of NLP and text classification.
In this lecture we will learn about the github and git tools. Ap is main seekhen gay k kaisay github ko sue kar k ap apnay documentation save kar saktya hyn.
In this lecture we will learn about the use of chatGPT to maintain your social media accounts.
Your feedback matters alot, may you please comment on the following post on facebook to give use your feed back?
Dr. Muhammad Aammar Tufail
PhD Data Science in Agriculture\ Youtube channel\ Twitter\ Linkedin github
contact: aammar@codanics.com