Analyticsphere / qaqc_testing

working with qaqc
0 stars 0 forks source link

Variables with [] around them #77

Closed KELSEYDOWLING7 closed 11 months ago

KELSEYDOWLING7 commented 1 year ago

I noticed that essentially wherever there is a variable with only one anticipated response of “I don’t know”, or one of two anticipated responses of “I don’t know” or a custom other response, we see the Reponses surrounded in brackets.

This is causing hundreds/thousands of extra lines of "errors" when the QC code generates as these are not valid values.

These are the variables that should be NA instead of [], 178420302 instead of [178420302], (and in only two variables instances noted in red, 958239616 instead of [958239616])

Module 1 • Mother age now: D_354326265_D_354326265 • Mothers age at death: D_422714611_D_422714611 • Fathers age now: D178774803 D_178774803 • Fathers age at Death: D_628078826_D_628078826 • Sibling (1-10) age now: D869387390#_#_D869387390#_# • Sibling (1-10) age at death: D537137982#_#_D537137982#_# • Child (1-10) age now: D640010727#_#_D640010727#_# (this variable) • Child (1-10) age at death: D236590500#_#_D236590500#_# (this variable)

Module2 • Don’t know how many (painrel1) pills per day: D_596961796_D_596961796

• Don’t know how many (painrel2) pills per day: D_825189914_D_825189914

• Don’t know how many (painrel3) pills per day: D_753416375_D_753416375

• Don’t know how many (painrel4) pills per day: D_646042915_D_646042915

• Don’t know how many (painrel5) pills per day: D_799338907_D_799338907

• Don’t know how many (painrel6) pills per day: D_893965588_D_893965588

• Don’t know how many (painrel7) pills per day: D_438682764_D_438682764

• Don’t know milligrams per day: D_991622246_D_991622246

• Don’t know concentration: D_273218182_D_273218182

Module 3 D_276575533_D_276575533 D_517100968_D_517100968 (not yet answered but once it is) D_933417196_D_933417196 D_585819411_D_585819411

Module 4 D_920576363_D_920576363 D_804504024_D_804504024 D_444145120_D_444145120 D_398762737_D_398762737 D_752101258_D_752101258 D_961572487_D_961572487 D_879180101_D_879180101 D_746604821_D_746604821 D_212343294_D_212343294 D_298296694_D_298296694 D_255474241_D_255474241 D_205492848_D_205492848 D_201906316_D_201906316 D_581231591_D_581231591 D_864213677_D_864213677 D_123104885_D_123104885 D_964853797_D_964853797 D_890661849_D_890661849 D_787064287_D_787064287 D_902193418_D_902193418 D_878688378_D_878688378 D_440597740_D_440597740 D_173413183_D_173413183 D_200086909_D_200086909 D_657986901_D_657986901 D_509526051_D_509526051 D_564684946_D_564684946 D_370121390_D_370121390 D_558981691_D_558981691 D_192184336_D_192184336 D_194944818_D_194944818 D_763354979_D_763354979 D_508587741_D_508587741 D_355179190_D_355179190 D_293954660_D_293954660 D_851731394_D_851731394 D_268612977_D_268612977 D_172669345_D_172669345 D_268612977_D_268612977 D_172669345_D_172669345 D_216096388_D_216096388 D_921998144_D_921998144 D_757983656_D_757983656 D_670316988_D_670316988 D_264797252_D_264797252 D_469914719_D_469914719 D_263588196_D_263588196 D_845811202_D_845811202 D_350394531_D_350394531 D_733317111_D_733317111 D_668887646_D_668887646 D_443679537_D_443679537

jacobmpeters commented 1 year ago

@FrogGirl1123 I'll add the email that I sent here, so that we have everything in one place:

Hi Nicole,

Kelsey requested that I make some changes to how the modules are flattening that would make things simpler for analysis and QAQC but would require some customization to our flattening process.

I wanted to loop you in before I make such a big change to how we flatten things.

For some module questions, there is only a single possible response and yet the response is listed in a list with brackets.

For example, Kelsey mentioned that “wherever there is only one anticipated response of “I don’t know”, or one of two anticipated responses of “I don’t know” or a custom other response. These are the variables that should be NA instead of [], 178420302 instead of [178420302]”. It looks like null is also possible, for these questions.

The screenshot has an example for the question: How old is your mother today?

<>

As Kelsey mentions in the messages below, there are MANY examples of this.

There are multiple potential solutions to this:

  1. Fix Quest so that these arrays for single answers don’t happen and then fix the data in Firestore. Upside: clean, long-term solution. Downside: big lift (I assume).

  2. Add a custom flattening function that applies to just these variables. Upside: smaller lift. Downside: the underlying problem still exists and will likely come up in the future.

  3. Write a function to fix this issue in our R scripts prior to analysis. Upside: smallest lift. Downside: even further downstream of the underlying problem and the flattened data won’t faithfully represent the source data.

Do you have any thoughts on this? Would you like to discuss it in a quick meeting before we proceed?

Best,

Jake

KELSEYDOWLING7 commented 11 months ago

@jacobmpeters (Just realized your full name was Jacob! I somehow never noticed that before in your username!) I think we may be able to close this issue once #77 is resolved since your last push seemed to fix this issue

jacobmpeters commented 11 months ago

haha! Yep, I'm Jacob :) Ok. I'll close it for now!