SrijanShovit / HealthLearning

A repo comprising of various Machine Learning and Deep Learning projects in healthcare domain.
38 stars 53 forks source link

Body Fat Prediction | 1. Dataset Exploration #1

Closed SrijanShovit closed 4 months ago

SrijanShovit commented 5 months ago

Use plots and statistical methods for detecting outliers and bring up solutions to deal with them.

sanketv010 commented 4 months ago

Hello Srijan Can you assign me this issue? I'd like to work on it. I'm planning to use Z-Score and IQR to handle the outliers and boxplots for the visualization part.

SrijanShovit commented 4 months ago

Hey @sanketv010 ,are you a GSSOC Contributor?

sanketv010 commented 4 months ago

Hey @sanketv010 ,are you a GSSOC Contributor?

Yes, I am

SrijanShovit commented 4 months ago

Fine @sanketv010 I will assign it to you once GSSOC '24 contribution period starts formally as this concerns with score afaik. Till then you can start working.

  1. Try to explore which all plots can be used apart from boxplots for detection and why. It will be great if you add some information about their comparison in markdown cells of the file.
  2. One more question, why and where you want to use Z-Score and IQR. Why both or why any one? Have some exposure.

Good luck!!

sanketv010 commented 4 months ago

Fine @sanketv010 I will assign it to you once GSSOC '24 contribution period starts formally as this concerns with score afaik. Till then you can start working.

  1. Try to explore which all plots can be used apart from boxplots for detection and why. It will be great if you add some information about their comparison in markdown cells of the file.
  2. One more question, why and where you want to use Z-Score and IQR. Why both or why any one? Have some exposure.

Good luck!!

Yeah sure thing I'll look into other plots & will add the information about their comparison as well and as for you question about where and why i want to use IQR and Z-Score, as we know they are also techniques for outlier detection & may perform better than the visualization technique as we'll have the exact values of the outliers which will eventually help us to select a more efficient method to handle them. I'll experiment with which one performs better & will add the information about it in the markdown cells as well.

aditi1807 commented 4 months ago

Hey srijan, Could you please assign me this issue as i have relevant skills and looking forward to contribute to machine learning projects

SrijanShovit commented 4 months ago

Hi @aditi1807 I will be assigning this issue to @sanketv010 . But don't be disheartened, please wait till today evening, you will find more issues opened and then I will assign you one.

Piyushseth55 commented 4 months ago

Hii @SrijanShovit, I am Piyush. Under GSSOC'24, I would like to work on this issue. Pls assign me this!! #gssoc24

SrijanShovit commented 4 months ago

@sanketv010, Start here:

Step 1

  1. Load the dataset
  2. Explore and confirm features and label(s) of this dataset
  3. Explore size/shape of dataset
  4. Investigate data type of features and labels and chose any better option for a particular column for data type if possible
  5. Calculate the memory usage differences
  6. Explore the statistical facts like mean, median, x percentiles of the columns