OJO44 / 24bMachineLearning1

0 stars 0 forks source link

Data Cleaning #1

Closed Ds2023 closed 4 months ago

Ds2023 commented 4 months ago

Hi Joseph,

I've been reviewing your project and I wanted to offer some suggestions to make it even better!

Project Documentation

README and Data Dictionary: These documents are really helpful for anyone trying to understand your project. Having a README that includes installation instructions, usage examples, and an overall project description would be a great addition. Similarly, a data dictionary explaining the features and variables in your data would improve clarity.

Including Analysis Questions: To enhance the flow of your analysis, it might be helpful to include the specific questions that led to your insights at the beginning of each analysis section. This will make it easier for someone following your work to understand the thought process behind each step.

Here are some resources that you might find helpful:

How to Write a Good README

Data Dictionary and Questions: You can get this from the exercise repository.

If you have any questions while creating these documents, please don't hesitate to ask!

Handling Missingness

Your implementation of the chosen techniques seems well-done.

To further strengthen this section, consider incorporating more visualizations and exploratory data analysis (EDA) to support the chosen methods.

For example, including distribution plots (histograms, boxplots) can reveal the nature of missingness (randomly distributed, concentrated in specific features) and identify potential outliers. This visual evidence adds weight to your decision-making process for handling missing values.

Main EDA - Great Start! Dannard Scaling:

It seems there might be a slight difference in how we're interpreting Dannard Scaling. Let's chat about this to ensure we're on the same page.

CPU's & GPU's Insight:

The comparison you've done is interesting, but keep in mind the time factor when looking at trends across a timeframe.

GPU Performance:

Haven't seen an analysis of GPU performance yet. Consulting the site you mentioned is a great resource to get ideas on how to implement this section.

GPU performance improvement:

I like that you've used a pairplot for this analysis! Consider consolidating the information into a single plot that utilizes color, hue, and size to visually represent different contributing factors. Again, the site can provide some helpful implementation techniques.

High-end vs. Low-end GPUs:

Solid effort on comparing high-end vs. low-end GPUs. Think about how you can incorporate the time factor to show adoption rates over time. This would add another dimension to the analysis.

Overall:

This is a great start to your EDA! Looking into the suggestions above will surely elevate the depth and clarity of your analysis.

OJO44 commented 4 months ago

thank you.