Recode-Hive / Stackoverflow-Analysis

Stack overflow is a professional community for developers. This repo analysis 3 years of developer Survey done by Stackoverflow and do visualization and predict the salary of Data Scientist in future.
https://stackoverflow-analysis.streamlit.app/
MIT License
110 stars 102 forks source link

Impute missing value and Normalization of the skewed Csv data. #58

Open akv2011 opened 1 month ago

akv2011 commented 1 month ago

Describe the bug Hi I would like to Hi @sanjay-kv I would like to Normalize skewed value and Perform EDA so that its' fit for model training. I plan to to go through all the data sets .

sanjay-kv commented 1 month ago

Could you let me know where is the bug with a ss. so I can more understand in detail the issue you try to solve.

akv2011 commented 1 month ago

Hi @sanjay-kv I am facing the following isue in the survey_results_sample_2018.csv Dataset Screenshot from 2024-05-12 19-42-51

There are bad lines in the dataset Would like to solve them Istead of just ignoring them

Also There are NA values in the datatset will like to Impute Them so that model can parse Them . image

Also In further issue i would work on the dataset to One Hot Code The Text value to perform Analysis on it

sanjay-kv commented 1 month ago

@akv2011 Assigned to you.

akv2011 commented 1 month ago

Thank you will work on it pronto ...and raise pull for review .

Shouryabhardwajj commented 4 weeks ago

I am interested in this issue. Please assign me this

Samik123Mit commented 3 weeks ago

pls assign me the issue,i have prior experience with the model.