dat-a-man / machine-learning

A repo to keep track of all the machine learning
0 stars 0 forks source link

Notes from Mentoring #1

Open peterfrancisrit opened 6 months ago

peterfrancisrit commented 6 months ago

Homework

peterfrancisrit commented 6 months ago

NOTE:

peterfrancisrit commented 6 months ago

https://www.kaggle.com/datasets

dat-a-man commented 5 months ago

Hi @peterfrancisrit , I have completed week one of the DTC Machine Learning Zoom camp. And it had some basics like

Next, I'll be going through week 2, which is going to be the Car prediction model project. Let's see If I can do something similar on another dataset. I will keep you posted, mostly weekly.

peterfrancisrit commented 5 months ago

Sounds good! Nice stuff!

peterfrancisrit commented 5 months ago

Just checking in! How is it going currently?

dat-a-man commented 4 months ago

@peterfrancisrit I completed the first guided project in the Machine Learning course by Data Talks Club. The project was about creating a car price prediction model, and I learned a lot from it. Key takeaways include:

  1. Data Splitting: Learned to break the dataset into train, validation, and test sets.
  2. Feature Selection: Identified and decided on relevant features.
  3. Feature Engineering: Applied various feature engineering techniques.
  4. Implemented linear regression and applied regularization.
  5. Model Usage: Used the model to make predictions, which were acceptably accurate.
peterfrancisrit commented 4 months ago

nice stuff! You might have to change the permissions: I can't click on it to see the code.

dat-a-man commented 4 months ago

I updated the Structure of directories, so I completed week 2

dat-a-man commented 4 months ago

@peterfrancisrit here are the updates of the last week: May 20th to 26th:

This week, I'm focusing on learning how to predict churn. It's week 3 of the ML learning boot camp, and we're starting by working with a [Kaggle dataset.](https://www.kaggle.com/datasets/blastchar/telco-customer-churn)

  1. The first step involved data preparation, which included standardizing names and values. Some values were changed from 'Yes' and 'No' to '0' and '1', and others were converted to numeric.
  2. The next step was setting up a validation framework. The dataset was divided into training, validation, and test sets using scikit-learn.
  3. Then, we looked into which features affect churn rates, such as gender or having a partner.
  4. We calculated the "risk ratio" to assess the likelihood of churn based on specific factors. For example, the risk ratio for people with no partners is 1.22, suggesting that they are more likely to churn compared to those with partners, whose risk ratio is 0.76.
  5. Next, we learned about the mutual information score using mutual_info_score from sklearn. A higher score indicates that the feature is more important.
  6. We also used the correlation function from pandas to calculate the correlation between numerical variables and the churn.
  7. By looking at the absolute value of the correlation coefficient, we can determine the strength and direction of the relationship.