combine demographic reports into a single PDF

andrewsu commented 1 year ago

Currently we generate separate PDF reports for demographic splits. For example, based on the sample data, we generate separate PDFs for 'NEURO LAB 2.pdf', 'NEURO LAB 2+Male.pdf', and 'NEURO LAB 2+Female.pdf'. On reviewing these reports with test users, we realized that it would be easier to use if all the demographic splits (gender and race/ethnicity) were included in a single PDF. So, there would only be one 'NEURO LAB 2.pdf', and the summary for one question might look like this:

The only thing we'd lose in this version is the actual counts, but I think that is an acceptable trade off.

(of all the issues, this is probably the most substantial change, so let's discuss feasibility...)

andrewsu commented 1 year ago

In thinking a bit more about the issues raised on slack, I'm thinking of this revised view:

Key points:

one set of bars for each combination of organizational level and demographic split
only show the demographic splits that are relevant at the lowest organizational level. For example, if NEURO LAB 2 has splits for gender but not race/ethnicity, then show only gender for higher organizational levels (assuming the minimum number threshold has been met at the higher organizational levels).
"report score" gets moved to the label for each set of stacked bars

rjawesome commented 1 year ago

I'll work on the bar graphs. I'll try to get it done by the end of this week but I might take sometime into next week.

One question I have is how the text categories be displayed in this new format?

andrewsu commented 1 year ago

@rjawesome what do you mean by "text categories"? Do you mean the labels like "Strongly agree", "agree", etc.? If yes, I think those can go below or above the horizontal bar chart.

(And actually as I was searching for examples, I stumbled on this page https://matplotlib.org/stable/gallery/lines_bars_and_markers/horizontal_barchart_distribution.html that perhaps would be a good starting point, except what they have labeled as "questions" would be demographic/organizational splits...)

rjawesome commented 1 year ago

By text categories I meant categories with text responses, ie. free response (screenshot below isn't a perfect example but basically I'm referring to questions where responses are enumerated in text form rather than like bar graphs)

Also would we also stack multi answer graphs, like these questions

Some functionality might break while I am working on this so I am putting it in the bar-refactor branch.

Btw, here is a screenshot from what I am currently working on (some of the scores and other info still needs to be moved)

andrewsu commented 1 year ago

By text categories I meant categories with text responses, ie. free response (screenshot below isn't a perfect example but basically I'm referring to questions where responses are enumerated in text form rather than like bar graphs)

Ahh, great point. For those responses, just concatenate all the answers as normal at the end of the report -- no need to separate by any demographic category. (In fact, this is even better than the old system, where it would be possible to break anonymity if a free-text comment showed up in two separate reports like "Latino", and "Female".)

Also would we also stack multi answer graphs, like these questions

If easy, yes, let's stack the multi-answer graphs for each demographic split. Something like this:

276477730-1ad141e4-a0f7-477f-b5a0-8fc34c7cd8fb_2

Some functionality might break while I am working on this so I am putting it in the bar-refactor branch.

Btw, here is a screenshot from what I am currently working on (some of the scores and other info still needs to be moved)

Great and great! I assume some of the rows in that screenshot will be removed when n<5?

rjawesome commented 1 year ago

Great and great! I assume some of the rows in that screenshot will be removed when n<5?

I believe the row shows up because at least 5 people were recorded as Latino. The reason it shows as n=1 is because blank answers were excluded so basically for this question there were 1 non-blank responses.

The way I have it set up right now is it calculates the count at the group level as opposed to the question level.

ADDTIONAL POINT: I think these graphs might get crowded if we add comparisons among organizational levels, how should this be handled? (especially in the multi graph ones, see screenshot below)

andrewsu commented 1 year ago

Great and great! I assume some of the rows in that screenshot will be removed when n<5?

I believe the row shows up because at least 5 people were recorded as Latino. The reason it shows as n=1 is because blank answers were excluded so basically for this question there were 1 non-blank responses.

The way I have it set up right now is it calculates the count at the group level as opposed to the question level.

Ahh, that makes perfect sense. In that case, carry on...

ADDTIONAL POINT: I think these graphs might get crowded if we add comparisons among organizational levels, how should this be handled?

So my guess is that it won't be too bad. Reports near the top of the organizational chart won't have many higher levels to compare to. And reports at/near the bottom of the chart won't have many (or any) demographic splits. So it should be a rare case where we have many demographic splits across many organizational levels, right? I think we can go on this assumption for now, and deal with it if it becomes a problem...

andrewsu commented 12 months ago

close as complete with #9

andrewsu / mentorship-survey-analysis

combine demographic reports into a single PDF #8