BenChehade / datasciences

attempt at data science competitions - mostly kaggle
MIT License
1 stars 0 forks source link

Data Exploration #4

Open DataMonsterBoy opened 7 years ago

DataMonsterBoy commented 7 years ago

I am currently doing data exploration. At the moment I have four tasks:

  1. Create scatter plots of Sales Price vs each variable
  2. Use a heat map to decide which variables are highly correlated and thus which ones can be deleted.
  3. Explore missing data and decide whether to delete it or fill it in
  4. Look at outliers and decide whether to keep those rows.

Is there anything that people would add to these four things on the list? I'd really appreciate any input into a strategy for this as I do currently have a good deal of free time and it would be good to spend it productively.

msmoore commented 7 years ago

Try a t-SNE plot - they can be useful.

On 30 Jun 2017, at 15:51, DataMonsterBoy notifications@github.com wrote:

I am currently doing data exploration. At the moment I have four tasks:

Create scatter plots of Sales Price vs each variable Use a heat map to decide which variables are highly correlated and thus which ones can be deleted. Explore missing data and decide whether to delete it or fill it in Look at outliers and decide whether to keep those rows. Is there anything that people would add to these four things on the list? I'd really appreciate any input into a strategy for this as I do currently have a good deal of free time and it would be good to spend it productively.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

DataMonsterBoy commented 7 years ago

Thanks. Will give that a go.