episphere / connect

Connect API for DCEG's Cohort Study
10 stars 5 forks source link

Stage data destruction stub record, remaining issues #658

Closed robertsamm closed 1 year ago

robertsamm commented 1 year ago

Reviewed stub record in stage for connectID 1475895409 and found a few remaining issues. After the stub record was pushed, it did the same thing in the SMDB where I can't view the Participant Summary page for this participant anymore, I'm not sure if that is intentional or not but I don't see it in the SOP. So looking at this record in the SMDB and the CIDs in the array Jessica sent me here's the few remaining issues I found

  1. Preferred email should not be retained, needs to be removed (I see it in the SMDB and the list of CIDs in the array you sent)
  2. Preferred name should also not be retained, needs to be removed (I see this listed in the array)
  3. Participant demographic variables should not be retained (site reported age, site reported race/ethnicity, site reported sex). I see this in the SMDB but I don't see the CIDs listed in the array you sent. These are all vars sent by the site.
  4. Participant verification table variables for site match vars and campaign type should not be retained (first name match, last name match, DOB match, PIN match, token match, zip code match, age match, cancer status match). These are also all vars sent by the site and I see them filled in still on the SMDB. Should only need to retain verification status and time of verification.
  5. Participant Summary page unclickable after stub record created
  6. I'm not able to check to confirm that the biospecimen variables were not retained without the pt summary page. Can someone doublecheck this data on the backend?
Davinkjohnson commented 1 year ago

Let's extend this issue to cover the SMDB components of what we agreed to resolving for Data Destruction.

Required functionality for the SMDB

jhflorey commented 1 year ago
- Search for participants (allow sites to search
 their own participants)
- Retain Query.##Name arrays for SMDB Search
- Do Not retain Query.emails/phone arrays
image

For these items

jhflorey commented 1 year ago

This requires addition of “Health care provider” For this item we have to keep 827220437 - Health care provider

jhflorey commented 1 year ago

- Participant Summary Form For this item we have to keep 130371375 - Payment Round

jhflorey commented 1 year ago
- All other variables/derivations will become “null” or “blank”, etc.
- Rows will still appear on the page but no longer populated
- If possible: set non-stub record rows to "Data deleted" When new destroyed? CID (to be added) is "yes"

As my understanding for these items that we will not remove non-stub-records. Instead we will update all to null or blank. On the other hand, in PWA / SMDB will still show rows(non-stub-records) on the page but no longer populated it. On firestore, add a field such as Data deleted for participant that have been destroyed with value is 353358909 - yes

jhflorey commented 1 year ago

- Print forms for participants (Consent, HIPAA, HIPAA revocation, and Data Destruction) . it seems to belong to PWA page

jhflorey commented 1 year ago
- Should only see data retained in stub record data
- This should use the “health care provider” for the version of consent/HIPAA

For these items, i do not really understand clearly. Please help explain it more. Thanks

brotzmanmj commented 1 year ago

This requires addition of “Health care provider” For this item we have to keep 827220437 - Health care provider

Yes, please add this to the Excel list of variables to be retained in the stub record

brotzmanmj commented 1 year ago

- Participant Summary Form For this item we have to keep 130371375 - Payment Round

Yes, please add this to the stub record variable list too

brotzmanmj commented 1 year ago

- Print forms for participants (Consent, HIPAA, HIPAA revocation, and Data Destruction) . it seems to belong to PWA page

This functionality is on both the PWA and the SMDB

brotzmanmj commented 1 year ago

For the other items, it would probably be clearer if we looked at the Participant Summary page together on a brief call to walk through what is needed.

jhflorey commented 1 year ago

This is the list of stub records.

Stub_Record_ConceptIDs.xlsx

jhflorey commented 1 year ago

- Print forms for participants (Consent, HIPAA, HIPAA revocation, and Data Destruction) . it seems to belong to PWA page

This functionality is on both the PWA and the SMDB

Hi @brotzmanmj , this is what you mean in SMDB (we will disable Participant Withdrawal for participants whose data has been destroyed)

image
brotzmanmj commented 1 year ago

Hi Jessica, Yes that's correct

jhflorey commented 1 year ago

@brotzmanmj i already added Health care provider and Payment Round. Also uploaded file to https://nih.app.box.com/file/1255095111396

brotzmanmj commented 1 year ago

Confirmed, thanks! I believe we also decided to retain the user profile name changes in the history so can you add those variables too?

Davinkjohnson commented 1 year ago

Yes, It looks like the query.firstName and query.lastName should be included in the list mentioned above. This will enable proper searching in the SMDB.

jhflorey commented 1 year ago

@brotzmanmj @Davinkjohnson Already updated.

JoeArmani commented 1 year ago

Adding a note about changes to the query.firstName and query.lastName data structures just to make sure there's no conflict moving forward.

Possibly related issue: https://github.com/episphere/connect/issues/654

Currently: They are written as strings - PWA signup writes string, PWA edit writes string, SMDBedit writes string.

New update (PR coming Monday 7/10): These will be sometimes strings and sometimes arrays - PWA signup writes as string, PWA edit writes as array, SMDB edit writes as array.

Future: These will be arrays. PWA will be adapted to write new signup fields query.firstName and query.lastName as arrays. Existing participant data will also be updated to arrays.

jhflorey commented 1 year ago

@JoeArmani thanks for your informations.

Davinkjohnson commented 1 year ago

Based on the conversation from last week, here are the details we talked through and agreed to. (Note for Jessica, please hold back on the work in this issue, the SMDB components, until after all July release tasks have been complete.)

(For all data where fields/rows are requested to be "N/A" or "Data Destroyed" Firestore should contain no data for this, handle the missing data in the SMDB by displaying the requested text when new "Data Destroyed" CID = 353358909 - yes)

Participant Summary Page

Header Items

Details Section

Participant Details Page

KELSEYDOWLING7 commented 1 year ago

It looks like the data is still in BQ after the destruction

Davinkjohnson commented 1 year ago

@KELSEYDOWLING7 Can you confirm the following?

KELSEYDOWLING7 commented 1 year ago

@Davinkjohnson Yes, the tables ran this morning and I also reran them before searching for this participant. Just reran them again to triple check.

Examples of data still in BQ for this participant:

Davinkjohnson commented 1 year ago

I also found in the SOP the following requirement. @jhflorey this will also have to get added into the FaaS

Post Data Destruction: The Connect API to refuse future receipt of data on this participant from all data sources (surveys, EHR, etc.).

(Presumably this would be in both submitParticipantsData and updateParticipantData, where it would check for any data update NOT from the SMDB, if the participant has the data destroyed flag = yes then return some error code.)

jhflorey commented 1 year ago

@Davinkjohnson @KELSEYDOWLING7 already fixed as PR https://github.com/episphere/connectFaas/pull/395. We can try again in dev.

brotzmanmj commented 1 year ago

Thanks @jhflorey . Kaitlyn is going to submit data destruction for a couple of records this afternoon. We'll let you know which Connect IDs and what data they have beforehand.

kmazzilli commented 1 year ago

The two Connect ID's we have chosen to submit data destruction for are: 3994600604 (has modules 1,3,4 submitted and module 2 started but not submitted, menstrual survey, covid survey, SSN, research blood, urine, and mouthwash submitted) and 3231286166 (has modules 1-4, covid survey, SSN, clinical blood and urine submitted)

KELSEYDOWLING7 commented 1 year ago

Good morning, after the data destruction at 1am, all the raw table queries ran at 4:30am and again at 10:30am, all flattened table queries ran at 9:30am and again manually at 11am, I am still seeing data for all of these modules/surveys/specimens/shipments for both participants .

The only one I am unsure of is the menstrual cycle survey for 3994600604 because there are 2 records, but they both have today's date listed under the date variable. One has data and one does not.

jhflorey commented 1 year ago

@KELSEYDOWLING7 Do we have log after running after 1am?

KELSEYDOWLING7 commented 1 year ago

@jhflorey Sorry, I'm not sure what you mean by a log

jhflorey commented 1 year ago

It looks like the image below after running the table queries

image
jhflorey commented 1 year ago

Hi @KELSEYDOWLING7 could you show me how to do all the raw table queries ran ?

KELSEYDOWLING7 commented 1 year ago

Is that log on Firestore or BQ? It's not something I'm familiar with.

I just run the queries in BQ from the scheduled queries tab (the magnifying glass on the left-hand side of BQ, then select scheduled queries. The raw tables start with participants and the flattened tables start with FlatConnect

brotzmanmj commented 1 year ago

@jhflorey have you confirmed that all sources of data including the survey module data were entirely deleted from Firestore? Kelsey is looking at BQ, not at Firestore.

jhflorey commented 1 year ago

@brotzmanmj @KELSEYDOWLING7 just discussed with Davin. My bad, i will update code now.

Davinkjohnson commented 1 year ago

@brotzmanmj Jessica has to update the code to remove the data from all the other collections/tables. (there was a misunderstanding on the requirement here.) However, when we were reviewing tables to remove data from she asked about the boxes table. Did we decide whether that data should be destroyed? (there's no data in it that would tie back to a specific participant once the other data are destroyed. but it is technically their data.)

jhflorey commented 1 year ago

@brotzmanmj after reviewing we listed these collections bioSurvey_v1, clinicalBioSurvey_v1, covid19Survey_v1, menstrualSurvey_v1, module1_v1 module1_v2, module2_v1, module2_v2, module3_v1, module4_v1, ssn, biospecimen. We will delete the document of collection if connect_ID is equal to the connection_ID of the participant whose data has been destroyed. Please correct me if i was wrong or missing any collection.

brotzmanmj commented 1 year ago

@Davinkjohnson I was wondering about the Boxes data that included their samples as that gets a little complicated. If we don't delete the data from the Boxes table, will that mess anything up as the tubes are associated with the box they were shipped in? If we do delete their data from the Boxes, what happens if those were the only tubes in the Box vs a Box with other people's tubes in it? That's why we tested one of each type, to see what would happen. We might need guidance on this from Nicole.

brotzmanmj commented 1 year ago

@jhflorey I think that encompasses all the tables of data that exist and the different survey versions but @KELSEYDOWLING7 can you also confirm?

KELSEYDOWLING7 commented 1 year ago

@brotzmanmj Yes, those are all the tables we have for now for surveys and specimens

jhflorey commented 1 year ago

@brotzmanmj @Davinkjohnson @KELSEYDOWLING7 i'll update code for deleting bioSurvey_v1, clinicalBioSurvey_v1, covid19Survey_v1, menstrualSurvey_v1, module1_v1 module1_v2, module2_v1, module2_v2, module3_v1, module4_v1, ssn, biospecimen now. And the boxes table, i will work on it once we have a final decision.

Davinkjohnson commented 1 year ago

@jhflorey The decision has been made to KEEP the boxes table data since it will not directly link back to the participant and the data would be messy to attempt to delete. Please proceed with deleting all the other above called out data for each data destroyed participant.

jhflorey commented 1 year ago

@KELSEYDOWLING7 i just tested with my code changes in local. Could you help check participant with conceptID = 3994600604

KELSEYDOWLING7 commented 1 year ago

@jhflorey Yes, thank you. It looks like the raw tables haven't refreshed yet, and I don't believe that's something our team can manually push. So I will check first thing Monday morning

KELSEYDOWLING7 commented 1 year ago

@jhflorey @Davinkjohnson The data is deleted! None in mods 1-4, BUM, Covid, or Menstrual Surveys. The SSN survey flag is null, as is the flags for a partial or full social.

jhflorey commented 1 year ago

@KELSEYDOWLING7 Does it mean it meets our expectations?

KELSEYDOWLING7 commented 1 year ago

@jhflorey Yes, the data has been destroyed

KELSEYDOWLING7 commented 1 year ago

@jhflorey Will you be testing the code change with the second participant as well (Connect_ID=3231286166) ?

jhflorey commented 1 year ago

@KELSEYDOWLING7 not yet, i only try on 3994600604. Would you like me to do the same thing for 3231286166