apolbernardo / paper_1

my first paper as data analyst
0 stars 0 forks source link

Reorganize EDA #8

Closed docligot closed 11 months ago

docligot commented 11 months ago

Saw your dashboard - this is good as an initial EDA

In addition to this, and in preparation for our modeling exercise, you should now start creating EDA for a target variable.

For example, if Mental Health is the target variable - you should create a "flag" to identify cases with mental health challenges.

I made a sample of this here: check out pull request #7 (reorganize EDA)

Approve and merge it into main then pull it to check the sample target variable dashboard.

I got the mental health field and merged No and Don't Know, and also Yes and probably - so you now have a binary outcome. You can now visualize this flag vs. other fields and variables so you see which ones potentially have an effect on the outcome.

apolbernardo commented 11 months ago

sorry I am having difficulty understanding creating a flag to identify cases with mental challenges is this referring to variables that has risk in MH, I did pull request but fail to see sample

docligot commented 11 months ago

We can discuss further on Sunday. But basically a "flag" is the target outcome you want to model.

For example in the mental health field, there are 4 possible options:

For a target variable, it's best to have a binary outcome (two options).

So I combined Yes and Possibly, and No and Don't Know. (You might have your own approach)

Once you've decided on a target variable with a binary outcome, you can start visualizing the other variables in terms of this variable (e.g. check which gender has a higher mental health flag, etc.)

When we finally do the modeling - you will be trying to predict the probability of someone having a mental health flag.

docligot commented 11 months ago

The sample dashboard is in the EDA folder - dashboard_target.xlsx

Pull the main branch first on your local to reflect the changes and you should see the folder in your PC

apolbernardo commented 11 months ago

plan I was planning to use gender , age and if your working at tech company then illustrate based on categories who has MH support, employer support and co worker support and I dont know if it is good idea to use them as a score that if you are male in tech company in this age bracket likely you will receive environment support for your mental health if your female, working at tech at this group age you most likely to receive package to support MH do you think this plan will work po? I want to make this project my first prediction kind so I dont know if his plan would work

docligot commented 11 months ago

Both are doable. Maybe one scorecard is to predict who has mental health condition, and another scorecard to predict who will get mental health support.

The impt thing is to try weaving a story with the data. Is there a trend emerging from the information?

apolbernardo commented 11 months ago

dashboard eda 3 this is my final dashboard with 2 categories which are external and internal