episphere / questionnaire

1 stars 2 forks source link

M2 Backend Data Issue with HOUSE1/HOUSE2 #402

Closed boyd-mj closed 3 months ago

boyd-mj commented 4 months ago

@Davinkjohnson could you please look at Firestore and tell us how it compares to the data we're seeing stored on the backend for HOUSE1/HOUSE2. This is the only outstanding issue for M2 dev testing. Thanks.

From: Dowling, Kelsey (NIH/NCI) [C] [kelsey.dowling@nih.gov](mailto:kelsey.dowling@nih.gov) Sent: Tuesday, July 23, 2024 9:22 AM To: Lee, Richard (IMS) [LeeR@imsweb.com](mailto:LeeR@imsweb.com); Johnson, Davin (NIH/NCI) [C] [davin.johnson@nih.gov](mailto:davin.johnson@nih.gov); Peters, Jake (NIH/NCI) [C] [jake.peters@nih.gov](mailto:jake.peters@nih.gov) Cc: Boyd-Morin, Jennifer (IMS) [Boyd-MorinJ@imsweb.com](mailto:Boyd-MorinJ@imsweb.com); Sansale, Rebecca (NIH/NCI) [C] [rebecca.sansale@nih.gov](mailto:rebecca.sansale@nih.gov); Horner, Marie Josephe (NIH/NCI) [E] [mariejosephe.horner@nih.gov](mailto:mariejosephe.horner@nih.gov) Subject: RE: DEV Backend Testing Issues

Good morning Ricky,

That is odd then then that they didn’t make into the flattened tables after the weekend. But I see that the responses for both Mod1 scenario 5 and Mod2 scenario 10 are saved in the raw table so that’s a simple solution. Whenever this happens, just continue to let Jake or I know and that’s a quick fix; any variables not previously in the flattened tables (for example sib_canc_23) would require an additional step from Jake.

@Johnson, Davin (NIH/NCI) [C] Could someone on your team advise on the test responses below not matching the backend? I’m curious what they look like in Firestore. I’m assuming the same as BQ, but in that case I’m unsure how to proceed.

Module 2 HOUSE1 and HOUSE2 questions: • Module 2, Test Scenario 3 – Connect ID 2547971085 o HOUSE1  Response to HOUSE1A (D_733547268.D980120253) should have been 671267928 and instead it is 647504893 o HOUSE2 (D 128705365)  Response to HOUSE2A (D_128705365.entity.D_607323377) should have been 264163865 and instead is null

• Module 2, Test Scenario 10 – Connect ID 4082746033 o HOUSE1 (D 733547268, all questions in this grid answered)  responses should all be 582006876 and they are all 645894551 o HOUSE2 (D 128705365, all questions in this grid answered)  these responses DO in fact match. They are just in the raw table and need to be flattened

Thank you, Kelsey

From: Lee, Richard (IMS) [LeeR@imsweb.com](mailto:LeeR@imsweb.com) Sent: Monday, July 22, 2024 4:22 PM To: Dowling, Kelsey (NIH/NCI) [C] [kelsey.dowling@nih.gov](mailto:kelsey.dowling@nih.gov) Cc: Boyd-Morin, Jennifer (IMS) [Boyd-MorinJ@imsweb.com](mailto:Boyd-MorinJ@imsweb.com); Peters, Jake (NIH/NCI) [C] [jake.peters@nih.gov](mailto:jake.peters@nih.gov); Sansale, Rebecca (NIH/NCI) [C] [rebecca.sansale@nih.gov](mailto:rebecca.sansale@nih.gov); Johnson, Davin (NIH/NCI) [C] [davin.johnson@nih.gov](mailto:davin.johnson@nih.gov); Horner, Marie Josephe (NIH/NCI) [E] [mariejosephe.horner@nih.gov](mailto:mariejosephe.horner@nih.gov) Subject: [EXTERNAL] RE: DEV Backend Testing Issues

Hi Kelsey,

These tests scenarios for Modules 1 and 2 were completed Friday afternoon and I checked the BigQuery data today, so I think the data should have been added to the flattened tables. Does that indicate they’re not being added the flattened tables for some other reason?

We retested the responses for the HOUSE1 and HOUSE2 in Module 2 using two different accounts and are not seeing the correct responses in BigQuery. Can Davin and team check that these questions are producing the correct responses?

HOUSE1A D_733547268_D_980120253 HOUSE1B D_733547268_D_993029890 HOUSE1C D_733547268_D_980196073 HOUSE1D D_733547268_D_115422925 HOUSE1E D_733547268_D_151161693 HOUSE2A D_128705365_D_607323377 HOUSE2B D_128705365_D_491484323 HOUSE2C D_128705365_D_588637585 HOUSE2D D_128705365_D_199039940 HOUSE2E D_128705365_D_986476579

Thanks, Ricky

From: Dowling, Kelsey (NIH/NCI) [C] [kelsey.dowling@nih.gov](mailto:kelsey.dowling@nih.gov) Sent: Monday, July 22, 2024 3:16 PM To: Lee, Richard (IMS) [LeeR@imsweb.com](mailto:LeeR@imsweb.com) Cc: Boyd-Morin, Jennifer (IMS) [Boyd-MorinJ@imsweb.com](mailto:Boyd-MorinJ@imsweb.com); Peters, Jake (NIH/NCI) [C] [jake.peters@nih.gov](mailto:jake.peters@nih.gov); Sansale, Rebecca (NIH/NCI) [C] [rebecca.sansale@nih.gov](mailto:rebecca.sansale@nih.gov); Johnson, Davin (NIH/NCI) [C] [davin.johnson@nih.gov](mailto:davin.johnson@nih.gov); Horner, Marie Josephe (NIH/NCI) [E] [mariejosephe.horner@nih.gov](mailto:mariejosephe.horner@nih.gov) Subject: RE: DEV Backend Testing Issues

Hi Ricky,

Thanks for letting me know. Any time there’s data in the raw tables but not the flattened, the flattened tables just need to be refreshed either automatically at 9:30am or if we manually refresh them. So those wouldn’t be actual problems, just a timing issue.

For BQ not matching the recordings, typically those are typos in which responses were selected. We had that come up many times in the past during testing. If you’re confident that those are in fact the wrong responses, I would either recommend retesting those two questions or asking Davin’s team to check Firestore and see if that matches what you selected.

Best, Kelsey From: Lee, Richard (IMS) [LeeR@imsweb.com](mailto:LeeR@imsweb.com) Sent: Monday, July 22, 2024 2:50 PM To: Dowling, Kelsey (NIH/NCI) [C] [kelsey.dowling@nih.gov](mailto:kelsey.dowling@nih.gov) Cc: Boyd-Morin, Jennifer (IMS) [Boyd-MorinJ@imsweb.com](mailto:Boyd-MorinJ@imsweb.com); Peters, Jake (NIH/NCI) [C] [jake.peters@nih.gov](mailto:jake.peters@nih.gov); Sansale, Rebecca (NIH/NCI) [C] [rebecca.sansale@nih.gov](mailto:rebecca.sansale@nih.gov); Johnson, Davin (NIH/NCI) [C] [davin.johnson@nih.gov](mailto:davin.johnson@nih.gov); Horner, Marie Josephe (NIH/NCI) [E] [mariejosephe.horner@nih.gov](mailto:mariejosephe.horner@nih.gov) Subject: [EXTERNAL] DEV Backend Testing Issues

Hi Kelsey,

These are issues we found in backend data for Modules 1 and 2. The issues are highlighted in the attached spreadsheets.

Module 1, Test Scenario 5 – Connect ID 4746105364 These seem to be issues with flattening. The highlighted fields are in the raw tables but not the flattened tables.

Module 2, Test Scenario 3 – Connect ID 2547971085 We checked the responses for HOUSE1 and HOUSE2 and the backend data does not match what we recorded.

Module 2, Test Scenario 10 – Connect ID 4082746033 We checked the responses for HOUSE1 and HOUSE2 and the backend data does not match what we recorded. There are also several fields that seem to be missing from the flattened tables that are present in the raw tables.

Thanks, Ricky

JoeArmani commented 4 months ago

@boyd-mj Will you point me to the scenario that led to the HOUSE1/HOUSE2 issue? I haven't been able to reproduce the incorrect data scenario yet. So far, I'm seeing accurate values when I test this in dev. Thanks!

boyd-mj commented 4 months ago

@JoeArmani I looked again and the issue with HOUSE1 seems to be an issue with our template. It looks like Julie entered the concept IDs for the responses based on a 0-5 numbering system instead of 1-6 as they are in the DD/JSON:

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

HOUSE1A | Num | 2 | 648960871 | 44 = Never -- | -- | -- | -- | --   |   |   | 239152340 | 1 = Once a month or less   |   |   | 582006876 | 2 = 2 to 3 days per month   |   |   | 645894551 | 3 = 1 to 2 days per week   |   |   | 996315715 | 4 = 3 to 4 days per week   |   |   | 671267928 | 5 = 5 to 6 days per week   |   |   | 647504893 | 6 = Every day

The result is that the concept IDs are one off from the expected concept IDs in the document we are using. We'll send this to Julie so the template can be corrected for stage testing.

HOUSE2 is a little more confusing. In that case, the user is prompted with the question based on the HOUSE1 response, but it's not recorded on the backend. Ricky did a little more digging and we are suspicious it's a flattening issue. I will follow up with what he finds shortly. In the meantime, I'm attaching the most recent template we used where we found these issues. The issues are highlighted in red in col K and I've added some notes highlighted in yellow. dev_testing_MOD 2_Spanish_20240724_wcomments.xlsx

boyd-mj commented 4 months ago

@JoeArmani we did some additional digging. Here is the backend data for HOUSE2 for the scenario I attached above: image

The expected flattened ID (column name) for the questions is D_128705365_D_607323377 which has no data. There is another column D_128705365_entity_D_607323377 which has data but only for responses not "264163865". It looks like the response 264163865 is being stored in the column named D_128705365_string which is not expected. We see this for the previous accounts used where we also saw problems with the backend data.

Not sure if @KELSEYDOWLING7 would have any insight into why this is happening?

KELSEYDOWLING7 commented 4 months ago

@boyd-mj Ah, ok I know what this is. It's not a flattening issue but a bigger data (structure?) issue that Jake found a little while back. It's ongoing and I don't believe he's had the bandwidth the resolve it just yet. I'm unable to tag him but I'll send him this issue and comment on his behalf or try to add him here.

JoeArmani commented 4 months ago

@boyd-mj Thank you for the additional information.

@KELSEYDOWLING7 & @boyd-mj I found the string issue a bit earlier today while trying to reproduce the main issue in this thread. It appears to be a long-running bug where single grid responses are stored as strings (without the associated key) instead of being stored as a key-value pair. I am currently testing a fix for this item.

JoeArmani commented 4 months ago

@KELSEYDOWLING7 Do you know offhand if there are any other scenarios (besides the grid questions I mentioned) where this is an issue?

KELSEYDOWLING7 commented 4 months ago

From: Jake Peters notifications@github.com Sent: Thursday, July 25, 2024 4:36:11 PM (UTC-05:00) Eastern Time (US & Canada) To: Analyticsphere/flatteningRequests flatteningRequests@noreply.github.com Cc: KELSEYDOWLING7 kelsey_dowling@outlook.com; Author author@noreply.github.com Subject: Re: [Analyticsphere/flatteningRequests] Array issue for BioClin_DBUrineID_v1r0 (Issue #57) Hi Kelsey, I reported these issues with the entity fields here initially: episphere/connect#938 And then we made a new issue specifically for the grid questions: episphere/connect#1049 There are a couple of simpler issues with biospecimen variables, but they have not been addressed yet as they were seen to be lower priority than the data structure issues with the grid questions. I have a query available that screens for all abnormal data structure issues: /* Author: Jake Peters Date: May 2024

GitHub Issue: https://github.com/episphere/connect/issues/938

Objective: Check BQ tables for columns with a data_type containing the substring "provided" and columns with names containing the substrings 'provided|string|integer|float|object'. This is used as a test for mixed-type data.

      The query is divided into two main sections:
      1. Mixed-Type Check for `Connect` Tables
      2. Mixed-Type Check for `FlatConnect` Tables

*/

check_for_mixed_type_data

-- [1] Mixed-Type Check for Connect Tables -------------------------------------------------- -- Check the Connect.INFORMATION_SCHEMA.COLUMNS table in DEV, STG, and PROD environments -- for columns with a data_type containing the substring "provided".

SELECT 'DEV' AS environment, table_schema, table_name, column_name, data_type FROM nih-nci-dceg-connect-dev.Connect.INFORMATION_SCHEMA.COLUMNS WHERE REGEXP_CONTAINS(data_type, 'provided')

UNION ALL

SELECT 'STG' AS environment, table_schema, table_name, column_name, data_type FROM nih-nci-dceg-connect-stg-5519.Connect.INFORMATION_SCHEMA.COLUMNS WHERE REGEXP_CONTAINS(data_type, 'provided')

UNION ALL

SELECT 'PROD' AS environment, table_schema, table_name, column_name, data_type FROM nih-nci-dceg-connect-prod-6d04.Connect.INFORMATION_SCHEMA.COLUMNS WHERE REGEXP_CONTAINS(data_type, 'provided');

JoeArmani commented 4 months ago

@boyd-mj This update is quest-dev to continue testing. The update should resolve: (1) Data structure issues. The grid data structure issues should be resolved. (2) Displayif grids. When a user clicks 'back', updates a grid selection, and clicks 'next', that should now be working as expected. Additional selections will appear (this was updated), and deselected options will not be visible (this was already working).

boyd-mj commented 4 months ago

Hi @JoeArmani just to make sure I'm understanding, should we test this in the renderer (https://episphere.github.io/quest-dev/) or can we test directly on the dev PWA? Thanks!

JoeArmani commented 4 months ago

Hi @boyd-mj I'm guessing the PWA is a more robust place to test, right? Both of those point to this code update, so either location is good. I prefer to do my testing in the PWA since it provides the true user experience. Thanks!

boyd-mj commented 4 months ago

Thanks, agree. I just wanted to make sure both had the updates pushed. We'll test and let you know how it goes.

m-j-horner commented 3 months ago

@boyd-mj just following up, what is the status of the testing on this issue?

boyd-mj commented 3 months ago

@JoeArmani we confirmed this is fixed via backend testing in dev. This issue can be closed, thanks!

JoeArmani commented 3 months ago

Perfect, thanks @boyd-mj. @m-j-horner, I'm just tagging you here to ensure you are notified that this issue is resolved. I'll close it now.