ecn310 / course-project-diop

course-project-diop created by GitHub Classroom
0 stars 0 forks source link

Charts and Graphs #14

Open abigailmondin opened 8 months ago

abigailmondin commented 8 months ago

Here we want to upload any charts or graphs we create (ex. pie charts, histograms, etc.). You may want to include a description, brief analysis, or explain the importance of the chart/graph.

ltsippel commented 8 months ago

Rate memory pd101 pie Graph Here is the Rate Memory Pie Chart for variable pd101 I am still figuring it out on how to label the legend with words not numbers. I used graph pie, over(pd101) plabel(_all percent)

kbuzard commented 8 months ago

@ecn310/diop A pie chart is good for understanding each variable on its own. At this point, you need to start using stats / visuals that link together the key variables in your hypothesis. Dylan or I are happy to help brainstorm this if you're not sure how to get started.

abigailmondin commented 8 months ago

pz262 (pw dementia) over pz216 (r years of education)

pz262_over_pz216 Code: graph bar (count) pz262 if pz262 == 1, over (pz216)

Description: This bar graph shows the number of respondents who answered "yes" to having dementia over the number of years of education they received. I think it makes sense to see a spike at 12 years because 12 years of schooling would mean the person made it through elementary, middle school, and high school. According to this particular group of respondents there doesn't seem to be a direct correlation between the number of years of schooling someone received and the person developing dementia.

abigailmondin commented 8 months ago

Frequency Histogram of pz216 (r years of education)

frequency_histogram_pz216 Code: histogram pz216, frequency

Description: This is a frequency histogram that displays pz216 (r years of education). I believe this will be helpful to visually demonstrate the range of years of schooling that the respondents of our data have received. This will also help to put into context the way the bar graph above spikes at 12 years of schooling.

Updated x-axis labels

frequency_histogram_pz216 @kbuzard I've updated the graph to label each bar along the x-axis, which I did using the graph editor in Stata. It still seems kind of weird to me because the ticks don't line up exactly with the bars.

abigailmondin commented 8 months ago

pz261 (pw Alzheimer's) over pz216 (r years of education)

pz261_over_pz216 Code: graph bar (count) pz261 if pz261 == 1, over (pz216)

Description: This is a bar graph similar to the dementia bar graph created above. This graph shows the respondents who answered "yes" to having Alzheimer's over the number of years of schooling they received. Similarly to the dementia graph, this bar graph also has a major spike at 12 years of education, which I believe makes sense considering the frequency histogram of pz216.

kbuzard commented 8 months ago

@abigailmondin You might want to consider making a graph that has two bars for each category: one for the people with dementia and one for the people without. I think you'd take out the "if" statement and add another "over( )" for the pz261 variable, but I'm not 100% on that.

kbuzard commented 8 months ago

@abigailmondin Here's some code that I think should help (it's from ChatGPT, so buyer beware):

// Step 1: Tabulate the categorical variable
tabulate pz262, gen(cat_count)

// Step 2: Calculate percentages
egen cat_percent = total(cat_count) / count if !missing(pz262), by(pz262)

// Step 3: Create a bar chart
graph bar (asis) cat_percent, over(pz262) ///
  title("Percentage Distribution by Category") ///
  ytitle("Percentage") ///
  bar(1, color(blue)) ///
  legend(off)
abigailmondin commented 7 months ago

pd554 (get lost in familiar places) over pz216 (r years of education)

pd554_over_pz216 Code: generate pd554_yes = 0 replace pd554_yes = 1 if pd554 == 1 generate pd554_no = 0 replace pd554_no = 1 if pd554 == 5 graph bar pd554_yes pd554_no, over (pz216)

Description: This bar graph shows pd554 (get lost in familiar places) separated into those who responded "yes" (this is labeled "pd554_yes") and those who responded "no" (this is labeled "pd554_no") over pz216 (r years of education). Overall, pd554_no is much higher than pd554_yes at each year of education. But there does not seem to be a direct link between years of education and getting lost in familiar places.

kbuzard commented 7 months ago

@abigailmondin I think you need replace pd554_no = 1 if pd554 == 5. I'm pretty sure the red bars are about five times higher than the blue bars because of this.

abigailmondin commented 7 months ago

@kbuzard Thank you for catching that! I believe you were correct about the red bars being five times higher than the blue bars. I've corrected the code, created and attached the correct graph, and added an update to the description.

abigailmondin commented 7 months ago

pd554 (get lost in familiar places) over pc273 (ever had dementia)

pd554_over_pc273 Code: using the same generate statements as the pd554 over pz216 bar graph graph bar pd554_yes pd554_no, over (pc273)

Description: Similar to the previous graph, this is a bar graph that shows pd554 (get lost in familiar places) separated into those who responded "yes" and those who responded "no" over pz273 (ever had dementia). The biggest spike seen on the graph is those who responded "yes" to both ever having dementia and "yes" to getting lost in familiar places. I included the codebook for pc273 to help understand what the values 1, 3, 4, 5, 8, and 9 across the bottom of the graph are referring to.

Potential fix (based on feedback)

@kbuzard Are these the kind of changes you were suggesting we make to the graphs? I used the graph editor to make these changes because I really struggled to find code that would do the kind of thing we discussed. If this isn't what you were thinking, would you be able to help me find a way to accomplish what the graph should ideally look like?

updated_pd554_over_pc273

abigailmondin commented 7 months ago

pv009 (forgetful during daily activities) over pz216 (r years of education)

pv009_over_pz216 Code: generate pv009_yes = 0 replace pv009_yes = 1 if pv009 == 1 generate pv009_no = 0 replace pv009_no = 1 if pv009 == 5 graph bar pv009_yes pv009_no, over (pz216)

Description: This is a bar graph that shows pv009 (forgetful during daily activities) separated into those who responded "yes" and those who responded "no" over pz216 (r years of education). In this particular graph, there seems to be a relatively steady increase in those who responded "no" as the number of years of schooling increased. This correlates with our hypothesis that the more schooling a person gets the less likely they are to develop dementia as the increased amount of schooling stimulates your brain and increases cognitive health.

abigailmondin commented 7 months ago

pv009 (forgetful during daily activities) over pc273 (ever had dementia)

pv009_over_pc273 Code: using the same generate statements as the pv009 over pz216 bar graph graph bar pv009_yes pv009_no, over (pc273)

Description: Similar to the previous graph, this bar graph shows pv009 (forgetful during daily activities) separated into those who responded "yes" and those who responded "no" over pc273 (ever had dementia). The largest spike in this graph is those who responded "no" to being forgetful during daily activities and "no" to ever having dementia. There is also a significant spike at those who responded "yes" to being forgetful during daily activities and "yes" to ever having dementia. I included the codebook again for the variable pc273 to help understand the values 1, 3, 4, 5, 8, and 9 across the bottom of the graph.

sophiehaber commented 7 months ago

Ever had dementia vs college degree Code: tabulate pc273 pb016, row col Ever had dementia vs high school diploma Code: tabulate pc273 pb015, row col

Code for exporting tables:

  1. ssc install outreg2 (installs "outreg" package)
  2. outreg2 using ctab.doc, replace cross noaster
xorabear commented 7 months ago

pc272 ever had alzheimers over pz216 years of education

image

code / pz216 years of education pc272 ever had alzheimers pb014 highest level of education if response is 1 or 3 if response is 4 or 5 / C:\Users\aartis\OneDrive - Syracuse University\Documents\GitHub\course-project-diop

generate pc272_yes = 0 replace pc272_yes = 1 if(pc272 == 1| pc272 ==3) generate pc272_no = 0 replace pc272_no = 1 if(pc272 == 4| pc272 ==5) graph bar pc272_yes pc272_no, over (pz216)

xorabear commented 7 months ago

pc272 ever had alzheimers over pb014 highest level of education

image

Code / pz216 years of education pc272 ever had alzheimers pb014 highest level of education if response is 1 or 3 if response is 4 or 5 / C:\Users\aartis\OneDrive - Syracuse University\Documents\GitHub\course-project-diop

generate pc272_yes = 0 replace pc272_yes = 1 if(pc272 == 1| pc272 ==3) generate pc272_no = 0 replace pc272_no = 1 if(pc272 == 4| pc272 ==5) graph bar pc272_yes pc272_no, over (pb014)

scale for responses

  1. For no formal education 1-11. Grades
  2. High school 13-15. Some college
    1. College grad
    2. Post college (17+ years)
      1. Other
    3. DK (Don't Know); NA (Not Ascertained)
    4. RF (Refused) I'm working on editing these out so includes less people in the results
abigailmondin commented 7 months ago

Updated graphs (still not 100% perfect)

pd554 (get lost in familiar places) over pz216 (r years of education)

updated_pd554_over_pz216 Code: graph bar pd554_yes pd554_no, over (pz216) percent stack legend(position(12) rows(2) label(1 "respondents who get lost in familiar places") label(2 "respondents who don't get lost in familiar places")) blabel(total, format(%9.0f))

pv009 (forgetful during daily activities) over pz216 (r years of education)

updated_pv009_over_pz216 Code: graph bar pv009_yes pv009_no, over (pz216) percent stack legend(position(12) rows(2) label(1 "forgetful during daily activities") label(2 "not forgetful during daily activities")) blabel(total, format(%9.0f))

pd554 (get lost in familiar places) over pc273 (ever had dementia)

updated_pd554_over_pc273 Code: graph bar pd554_yes pd554_no, over (pc273, label(angle(45)) relabel(1 "Yes" 2 "Now has condition" 3 "Now doesn't have condition" 4 "No" 5 "Don't know" 6 "Refused")) percent stack legend(position(12) rows(2) label(1 "respondents who get lost in familiar places") label(2 "respondents who don't get lost in familiar places")) blabel(total)

pv009 (forgetful during daily activities) over pc273 (ever had dementia)

updated_pv009_over_pc273 Code: graph bar pv009_yes pv009_no, over (pc273, label(angle(45)) relabel(1 "Yes" 2 "Now has condition" 3 "Now doesn't have condition" 4 "No" 5 "Don't know" 6 "Refused")) percent stack legend(position(12) rows(2) label(1 "forgetful during daily activities") label(2 "not forgetful during daily activities")) blabel(total)

Overall update: I developed the code for the changes that I had made using the graph editor.

@kbuzard or @eldreddyl Things I still need help coding:

I've included all of the code I wrote to produce each graph for your reference, hopefully we can figure this out.

kbuzard commented 7 months ago

@eldreddyl Can you help @abigailmondin with this? I have to concentrate on giving feedback on everyone's analysis sections and writing two exams so am unlikely to have time until the weekend.

eldreddyl commented 7 months ago
  • Adding a label for the x axis on all four graphs, when I try using xtitle it gives me an error message

@abigailmondin Could you attach a screenshot of the error message?

abigailmondin commented 7 months ago

@eldreddyl Here is the screenshot of the error message I get when trying to use xtitle. IMG_1377

eldreddyl commented 7 months ago

@abigailmondin

So I played around with the code and read through the 'graph bar' documentation. To me, it doesn't seem like 'xtitle' is supported for bar graphs.

Screenshot 2023-12-13 122025

You can try two other options. I think either would work, so it may be up to your preference

1) Use Stata's Graph Editor to add the label yourself. The downside here is you'd have to do this for each graph you make manually and I don't think there is an easy way to make this reproducible

2) Add a descriptive title to the graph using title("Years of Education by Respondent Type") This method is reproducible since it would be in your do file. It would also implicitly describe your x-axis. I believe this is what the documentation meant by 'irrelevant for bar charts.'

Give that a try and let me know if Stata is still giving you trouble

While you do that, I'll look into the pc273 cutoff issue

eldreddyl commented 7 months ago

@abigailmondin So I ran your code to generate the pc273 graphs. At least when I ran it, the labels weren't cutoff. I don't think its a Stata issue. I would look into your method of saving the images.

You could

1) Screenshot the graph 2) Save using the Stata Graph editor 3) Use graph export in your code

abigailmondin commented 7 months ago

@eldreddyl When I re-run the code for the pc273 graphs the labels are still cutoff. Is it possible there is a different issue?

eldreddyl commented 7 months ago

@eldreddyl When I re-run the code for the pc273 graphs the labels are still cutoff. Is it possible there is a different issue?

@abigailmondin what method are you using to save them?

abigailmondin commented 7 months ago

what method are you using to save them?

@eldreddyl I'm clicking the save icon, making it a png, and saving them to a folder. But even when I just run the code without saving them the labels are cutoff.

xorabear commented 7 months ago
Variable |        Obs        Mean    Std. Dev.       Min    
Max

-------------+--------------------------------------------------

pz216 | 16,259 12.7292 3.241885 0
17

The observation is the amount of people that were looked at within the data. The mean means that on average people have around 12.7 years of education. Standard deviation is how close the results are as compared to the mean which I believe means that some participants have 3 years less than or more than 12.7 years of education. The varoable max means the

ever had
dementia Freq. Percent Cum.

1 500 2.43 2.43 3 1 0.00 2.43 4 51 0.25 2.68 5 20,030 97.23 99.91 8 17 0.08 99.99 9 2 0.01 100.00

Total 20,601 100.00 1.YES 3.DISPUTES PREVIOUS WAVE RECORD, BUT NOW HAS CONDITION 4. DISPUTES PREVIOUS WAVE RECORD, DOES NOT HAVE CONDITION 5. NO 8. DK (Don't Know); NA (Not Ascertained) 9. RF (Refused)

This numbers correlate to the responses of people within the data. Tabulation is a compilation of results within the data which highlights the relationship between education and dementia as most people answered no. Frequency and percentage speak to how many people have dementia within the data set.

. pwcorr pz216 pc273, sig

    pz216   pc273

pz216   1.0000 

pc273   0.0496  1.0000 
    0.0000

Pwcorr can be used as a connection between the two variables. The p-value is 0.000. Since this is less than 0.05, the correlation between these two variables is statistically significant. The correlation coefficient measurement ranges from -1 to 1, -1 states there is a perfect negative relationship, 0 symbolizes there is no relationship, and 1 demonstrates a perfect positive relationship. Summarize and tab1 provide background for pwcorr and help with the interpretation.

kbuzard commented 7 months ago

@xorabear Remember that you need to ask for the significance level of the correlation, so you need to add ", sig" to the end of the pwcorr command.

xorabear commented 7 months ago

@xorabear Remember that you need to ask for the significance level of the correlation, so you need to add ", sig" to the end of the pwcorr command.

should i keep summarize and tab1 I thought that might help understand pwcorr, sig @kbuzard

kbuzard commented 7 months ago

should i keep summarize and tab1 I thought that might help understand pwcorr, sig

@xorabear tabulating the education variable is the same as a graph you already have, and it is harder to read than looking at the graph, so I suggest you only keep the graph. The results of summarize (and maybe also including a median) would be good to include in your data section.

summarize on the dementia variable gives you statistics that are not really meaningful; the average of the codes that represent different answers doesn't mean anything. the key information in tabulate is useful for this variable, that is, we see that someone identifying as having dementia is quite rare.

abigailmondin commented 7 months ago

t least when I ran it, the labels weren't cutoff.

@eldreddyl I'm still unable to see the labels when I run the code. If you are able to see the labels, is there any way you could save them or screenshot them and attach them here?

eldreddyl commented 7 months ago

@abigailmondin For the purposes of your project, it would be better for your group if you submitted the cutoff-label graphs vs having me attach them. That way you won't lose as many points on the reproducibility section of the rubric.

Here are some other options in the meantime: 1) Experiment with using the remote desktop vs. the physical computers in Eggers 040. As long as the room isn't booked, you should still have access this week 2) See if anything changes when one of your group members saves the graphs 3) Play around with the graph export command. I'm not sure if there is some limitation on the graph size that is causing it to be cutoff