peterfrancisrit commented 6 months ago

Homework

The Data Science process (PACE): Plan, Analyze, Construct, Execute https://www.linkedin.com/pulse/pace-framework-nafiz-shahriar
Find a dataset on Kaggle that interests you
Come up with a question for the dataset and Plan (The P in Pace)
Analyse: Perform EDA (Exploratory Data Analysis) and understand the data
Construct: Construct models
Execute: Not really a deployment setting here, but this is executing that deployment, don't do deployment, just gather insights.

peterfrancisrit commented 6 months ago

NOTE:

Stick to supervised learning problems for now: f(x) = y, your model is finding f.
Scikit Learn is the most common and reliable package in Python. https://scikit-learn.org/stable/supervised_learning.html
Utilise ChatGPT! You will learn a tonne.
Any theory questions, feel free to ask!

peterfrancisrit commented 6 months ago

https://www.kaggle.com/datasets

dat-a-man commented 5 months ago

Hi @peterfrancisrit , I have completed week one of the DTC Machine Learning Zoom camp. And it had some basics like

Maths: matrix multiplications
choosing models
introduction to numpy and pandas
supervised machine learning

Next, I'll be going through week 2, which is going to be the Car prediction model project. Let's see If I can do something similar on another dataset. I will keep you posted, mostly weekly.

peterfrancisrit commented 5 months ago

Sounds good! Nice stuff!

peterfrancisrit commented 5 months ago

Just checking in! How is it going currently?

dat-a-man commented 4 months ago

@peterfrancisrit I completed the first guided project in the Machine Learning course by Data Talks Club. The project was about creating a car price prediction model, and I learned a lot from it. Key takeaways include:

Data Splitting: Learned to break the dataset into train, validation, and test sets.
Feature Selection: Identified and decided on relevant features.
Feature Engineering: Applied various feature engineering techniques.
Implemented linear regression and applied regularization.
Model Usage: Used the model to make predictions, which were acceptably accurate.

peterfrancisrit commented 4 months ago

nice stuff! You might have to change the permissions: I can't click on it to see the code.

dat-a-man commented 4 months ago

I updated the Structure of directories, so I completed week 2

dat-a-man commented 4 months ago

@peterfrancisrit here are the updates of the last week: May 20th to 26th:

This week, I'm focusing on learning how to predict churn. It's week 3 of the ML learning boot camp, and we're starting by working with a [Kaggle dataset.](https://www.kaggle.com/datasets/blastchar/telco-customer-churn)

The first step involved data preparation, which included standardizing names and values. Some values were changed from 'Yes' and 'No' to '0' and '1', and others were converted to numeric.
The next step was setting up a validation framework. The dataset was divided into training, validation, and test sets using scikit-learn.
Then, we looked into which features affect churn rates, such as gender or having a partner.
We calculated the "risk ratio" to assess the likelihood of churn based on specific factors. For example, the risk ratio for people with no partners is 1.22, suggesting that they are more likely to churn compared to those with partners, whose risk ratio is 0.76.
Next, we learned about the mutual information score using mutual_info_score from sklearn. A higher score indicates that the feature is more important.
We also used the correlation function from pandas to calculate the correlation between numerical variables and the churn.
By looking at the absolute value of the correlation coefficient, we can determine the strength and direction of the relationship.

dat-a-man / machine-learning

Notes from Mentoring #1

Homework

NOTE: