HEGSRR / OR-Replicability-in-Geography-Survey

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Analytical Plan #4

Closed Peter-Kedron closed 6 months ago

Peter-Kedron commented 1 year ago

@SarahBardin & @josephholler, this is a first draft of tasks to be done to prepare the replicability survey for analysis. I have tried to match, or at least mirror, the analytical strategy for the reproduction survey (RS1), so the two feel paired. Sarah, when you don't have RS1 tasks to do and you are have HEGSRR time, please start implementing this list. We are going for a repository that mirrors what we made for RS1.

This survey does not have the familiarity and use questions because we focused less on the mechanics of reproduction/replication here and went for the conceptual and relevance to the discipline. So our overall story is going to be a bit different and I don't see a single root summary table emerging like in RS1. So a first draft plan for presentation of results might be (grouped by subsections (S)):

Peter-Kedron commented 1 year ago

@SarahBardin, when you turn to these can you prioritize the work on the following questions? I'd like to get a quick look at those and start forming the AAG presentation as well as the paper narrative. Q7, Q8, Q10, Q15. Those the big frequency tables. They are easy lifts, and will let me get a sense of the overall picture quickly. Thanks

SarahBardin commented 1 year ago

@Peter-Kedron and @josephholler, I've pushed the initial cleaning code (still very much a work in progress), but I'd like your review of the back coding for Q3 (research subfield). There were two responses ("Innovation" and "Innovation Studies") that I wasn't sure how to fit within our existing classifications, but I also wasn't sure about the physical/enviro sciences versus nature and society classifications.

Peter-Kedron commented 1 year ago

@josephholler & @SarahBardin, just flagging in this same issue stack that Joe is going to take the lead on the qualitative coding system for Q5 - definitions; Q-19 reported reasons for actually attempting a replication; Q23-criteria for evaluating a successful replication. Have a look at the other Qualitative responses to see if those look worth coding based on response rates.

Joe, let us know when you have a scheme and we can do our independent coding then harmonize as before.

SarahBardin commented 1 year ago

@Peter-Kedron and @josephholler, I've generated the first set of tables for Q7, Q8, Q10, and Q15 and pushed them to the results folder. I'll work on the other analyses and cleaning up the factor variable ordering as I go, but this should hopefully be enough to get you started on the slides and the write up.

Peter-Kedron commented 1 year ago

@SarahBardin, these tables look like a great start. Thanks so much. I will get to work. The one open thing that I think can be done while Joe generates data coding schemes for those other questions is the descriptive stats for Q12-Q14.

SarahBardin commented 1 year ago

Done. I also generated tables for Q17. I can likely get you the rest of the quantitative tables this weekend. I took a peak at the qualitative responses and it looks like we have at least 30+ responses for each of these open responses, and in some cases double that.

Peter-Kedron commented 1 year ago

@SarahBardin listing out the small coding tasks, tables, and figures we planned out today

I'd like your opinion on these plots. What do you think we should use as the confidence interval? We can just show the std about the mean. However, I kind of like the idea of plotting the median, mean, and a spread measure. I am open on the spread measure, what do you think makes the most sense in this context?

Peter-Kedron commented 1 year ago

@SarahBardin What do you think about density plots over histograms for Q12-14? Is there enough data for the subgroups?

SarahBardin commented 1 year ago

@Peter-Kedron, I pushed some draft figures for Q12-Q14 to the figures folder. Note: For each question, I gave you a plot that has the density plot overlaid on top of the histogram. This way you get a sense of how the histogram compares to the density plot. I also made a stacked view of the density plots for the 3 questions, which visually shows how Q12 is distributed differently than Q13 and Q14.

SarahBardin commented 1 year ago

@Peter-Kedron and @josephholler , I've pushed figures for Q8, Q10, and Q15 to the figures folder. I didn't like the look of the "heat" map version of Q15, so I went with the uncentered stacked bar chart instead. I still need to update the labeling of the variables displayed, but hopefully these will give you enough to look through and reflect on for the time being.

Peter-Kedron commented 1 year ago

@SarahBardin, Thanks for building these figures. @josephholler feel free to add ideas.

Sarah, no need to do these during break, but here are my requested adjustments with an eye for use in the AAG presentations. All of these are just for the overall figures

For Q15

For Q8 & Q10

For Q12-14

Peter-Kedron commented 1 year ago

@SarahBardin, would you please resend me the final set of saved figures we created for the AAG presentation in a .zip via email, and also save them into \figures ? I am building the figures for the manuscript in illustrator. Thank you.

Peter-Kedron commented 12 months ago

Hello @SarahBardin,

Please find below a list of items we need your help with as we finalize this paper draft. I believe most are fairly minor. I indexed these by the line number of the current draft of the overleaf manuscript. @josephholler may edit the doc a bit over the next week, so those numbers may change. However, they should be close. Also these are all accompanied by comments in the document, so if you follow the comment string down (ignoring comments with discussion points) you should basically get this list.

For the figure change requests. We'll want the code and figures on Git. However, the end product I need is just the generated figure, which I will work with in illustrator to finalize.

I know you are finalizing your proposal draft. Please prioritize that over these tasks. Thanks for your help with this effort.

SarahBardin commented 11 months ago

@Peter-Kedron,

For Figure 3, how should I depict Don't Know and Missing's in the bar chart? We had filtered these out of the view for the diverging bar charts, but for the stacked bar chart for the bars to sum to 100, we either need to include Don't Know and Missing and assign them a color (we could set to white?), or we exclude them. If we exclude them, then the length of the bar chart visually when forced to proportionally sum to 100 won't look accurate (this is currently what is happening).

Peter-Kedron commented 11 months ago

@SarahBardin good question. Please exclude the don't know and the missing from the bar chart. Your reasoning is spot on, but if you look at the current Fig 3 in overleaf you can see how I handled it. There I report the don't know and missing responses as numerical columns to the right of the response bars. That brings the figure into rough alignment with Fig 2, where we keep these aside to preserve the comparability of the divergence off the center line. I'm expecting the bar greyed bars to not sum to 100. The remainder being handled by the two numerical columns. It's imperfect, but I think coherent with the other figures. Sound good?

josephholler commented 11 months ago

That sounds good. We handled it differently on the web app because of the coding challenges of trying to make the side table with Shiny Apps, instead choosing to place them in neutral colors in the middle. Either way works.

From: Peter-Kedron @.> Sent: Monday, September 25, 2023 11:48 AM To: HEGSRR/OR-Replicability-in-Geography-Survey @.> Cc: Holler, Joseph @.>; Mention @.> Subject: Re: [HEGSRR/OR-Replicability-in-Geography-Survey] Analytical Plan (Issue #4)

@SarahBardinhttps://github.com/SarahBardin good question. Please exclude the don't know and the missing from the bar chart. Your reasoning is spot on, but if you look at the current Fig 3 in overleaf you can see how I handled it. There I report the don't know and missing responses as numerical columns to the right of the response bars. That brings the figure into rough alignment with Fig 2, where we keep these aside to preserve the comparability of the divergence off the center line. I'm expecting the bar greyed bars to not sum to 100. The remainder being handled by the two numerical columns. It's imperfect, but I think coherent with the other figures. Sound good?

- Reply to this email directly, view it on GitHubhttps://github.com/HEGSRR/OR-Replicability-in-Geography-Survey/issues/4#issuecomment-1734012774, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACMTA72JDETZETYEND2A5QLX4GRSTANCNFSM6AAAAAAUI5M26U. You are receiving this because you were mentioned.Message ID: @.**@.>>

SarahBardin commented 11 months ago

Pushed updated figures--I provided both labeled and unlabeled versions. As the font gets larger, it is harder to keep the labels from overlapping and despite my best efforts using a variety of functions to force spacing, I couldn't get it programmatically to work well without simply removing the % symbol. Ergo, I provided labeled versions and unlabeled versions, so Peter can determine which ones to use in post.

Peter-Kedron commented 11 months ago

@SarahBardin thank you very much. I will take it from here. Enjoy your West Coast road trip.

SarahBardin commented 11 months ago

@Peter-Kedron, FYI there are 90 people who reported attempting replications (Q17_rep_behavior_5 == Yes). It appears that you coded 5 of these individuals as not having actually attempted a replication. One person (Response ID == R_2qsyrY0KkiYGt07) you coded as "add 1". Should I exclude this person in my calculations for the demographics table? Is this how you were getting to 84 unique respondents with replication attempts?

Peter-Kedron commented 11 months ago

@SarahBardin Sorry I didn't back to you on this question. The final filter is column I "rep_flag" in the Q17_coding file. The 1 entries there get you the 84 replication attempts. Joe and I went back and forth a bit, and my resolving those conversations and the columns in this spreadsheet resulted in the final coding in column I.

Peter-Kedron commented 11 months ago

@SarahBardin I got a closer look at Fig 3 today and I see what you are saying now. Because I was confused before I gave you the wrong instructions. What we want to do is include the DK and M in the calculation of the percentages for the bars, but exclude those values from the visualization of the bars. Based on your comments, what I think we can do is include them in the chart, but set their color to white. That plan should get the correct proportions on the other bars (most important) and hide the DK and M from the visualization. If there are divider bars or something between the DK and M that show up, I can mask them in illustrator. Sorry for the confusion on my end.

SarahBardin commented 11 months ago

@Peter-Kedron, no worries. I should've gone with my gut on this one. Anyways, I pushed the updated Fig 3 to Github. I collapsed DK and M into a single category and set that to white, so there is no dividing mark.

Thanks for confirming the correct spreadsheet to look at for the demographics table--I had looked at the _pk version which has a different coding system for the rep_flag variable. I will make that table momentarily along with the other requested table.

SarahBardin commented 11 months ago

@Peter-Kedron, closing the loop that I created the remaining tables and I updated the overleaf directly with the correct values for the sample completes. I also cross-checked the counts and percentages reported for the replication sample and adjusted a couple of numbers (some were off with rounding slightly) but for the most part all of those checked out.