Closed robertsamm closed 1 year ago
@jhflorey If you can that would be great! I'll double check that the second participant whenever you're done. The raw tables in BQ won't refresh until 4:30pm so it doesn't need to be done right away
@KELSEYDOWLING7 It's done for 3231286166
. BTW my PR already merge into dev, so i guess you can check it out in dev.
@jhflorey Great, thank you.
It looks like for the first participant, Connect_ID 3994600604, there are some stub records deleted:
-consent middlename, consent suffix -Date of withdrawal -Who requested withdrawal -Middle Name extracted from HIPAA revocation form -Middle Name extracted from Data Destruction form
Checking with @brotzmanmj , the date of withdrawal and who requested the withdrawal deletion seems to make sense, but @kmazzilli confirmed that a middle name was inputted, and so it should not have been deleted. Do you mind looking into this?
@KELSEYDOWLING7 i think we should create new participant then testing in dev again.
@kmazzilli Was there another participant from the list I sent that we could use for additional testing?
@KELSEYDOWLING7 Connect ID:2361618927 has samples collected but needs additional surveys filled out. I can fill out the surveys, add to the name fields, go through with the data destruction, and let you know when I am done, if that works.
@kmazzilli That works for me!
@KELSEYDOWLING7 have you tested in dev?
@jhflorey Kaitlyn is testing in DEV now, there's some delay with the biospecimen shipment being resolved today. The deletion should happen tonight and I can check the data tomorrow
I was still encountering issues with shipping today - it sounds like a resolution is coming soon and I'll try again tomorrow
@brotzmanmj @Davinkjohnson @KELSEYDOWLING7 i'll update code for deleting
bioSurvey_v1, clinicalBioSurvey_v1, covid19Survey_v1, menstrualSurvey_v1, module1_v1 module1_v2, module2_v1, module2_v2, module3_v1, module4_v1, ssn, biospecimen
now. And the boxes table, i will work on it once we have a final decision.
@jhflorey we previously did not discuss the notifications table. We need to make sure the records related to a data destroyed participant are also removed from notifications. (Use token on this table for the key to participants.)
@kmazzilli Were we able to resolve the shipping issue? And are we planning to do a data destruction push tonight?
@Davinkjohnson @brotzmanmj Would you be able to add Jing to this issue for the backend testing next week?
Also please note the biospecimen data has two documents. I'm not sure if our data destruction code includes deleting all documents if there are multiple
@KELSEYDOWLING7 no, the shipping fix is still under review from the team and will not be ready by tonight according to Davin
@brotzmanmj @Davinkjohnson @KELSEYDOWLING7 i'll update code for deleting
bioSurvey_v1, clinicalBioSurvey_v1, covid19Survey_v1, menstrualSurvey_v1, module1_v1 module1_v2, module2_v1, module2_v2, module3_v1, module4_v1, ssn, biospecimen
now. And the boxes table, i will work on it once we have a final decision.@jhflorey we previously did not discuss the notifications table. We need to make sure the records related to a data destroyed participant are also removed from notifications. (Use token on this table for the key to participants.)
@Davinkjohnson Just merged my code changes into dev.
Thanks, @jhflorey I will help Kelsey check the data and let you know whether the data is deleted in these BQ tables in dev soon.
Hi all - I was able to ship the samples in Box 76 and go ahead with the data destruction request for connect_id: 2361618927
@Davinkjohnson Thank you very much for adding me in this chat, Davin. I will be back up for Kelsey's work next week. Please let me know if anything is needed on my side. Thanks a lot. Right now, I have checked the data destruction progress in dev for that Connect_ID 2361618927 all the data are still not deleted yet. I will check all related data in dev again and let you know whether the data are deleted tomorrow. Thanks a lot.
Thanks King, @jhflorey please check the data tomorrow in Firestore. Jing will check it in BQ. @jeannewu when you check it, please check that all the data sources are gone (including both biospecimen records, all surveys, SSN, etc), and also make sure all of the stub variables remain. Thanks.
@brotzmanmj Got it. I will. Thanks
looks like my code changes not work in dev. The data are still not deleted yet. Let me check it.
@jhflorey I have just checked BQ data in dev: bioSurvey_v1_JP, module1_v1_JP, module2_v1_JP, and menstrualSurvey_v1_JP don't have the data for Connect_ID=236168927. But the other tables (shown below), this participant data are still there. Connect_ID = '2361618927' | n columns | rows |
---|---|---|
bioSurvey_v1_JP | 380 | 0 |
biospecimen_J | 318 | 2 |
clinicalBioSurvey_v1_JP | 260 | 1 |
covid19Survey_v1_JP | 193 | 1 |
module1_v1_JP | 1506 | 0 |
module1_v2_JP | 1902 | 1 |
module2_v1_JP | 705 | 0 |
module2_v2_JP | 739 | 1 |
module3_v1_JP | 364 | 1 |
module4_v1_JP | 1302 | 1 |
@jeannewu yup the stats is correct. I had the PR to fix it. Will inform to you once it's merged into dev.
@brotzmanmj @jeannewu my code changes for fixing issue above already merged into dev. Do you want me to do manual trigger the job for running now or will you wait for the job to run at 1am and do the test tomorrow.
@jhflorey thank you very much for your asking. I am quite flexible to check on my end. And the BQ1 (flattened data) data in GCP are scheduled as updated/refreshed once a day (every morning around 10 am). If you would like me to check today, the original data .connect. in dev would be the better option. Otherwise, I will wait for recheck the FlatConnect in dev tomorrow. How about you?
@jeannewu i will manually run the data destruction job for testing my code changes in dev.
@jeannewu the firestore data was clean after running job manually.
Thanks to you both. @jeannewu can you check the list of stub variables and make sure they are all still there? And @kmazzilli can you check MyConnect and also the SMDB and make sure only the stub variables remain, the forms are all still accessible with the correct signatures, and the notifications are gone from both the MyConnect and the SMDB?
And Kaitlyn, we should also check the biospecimen dashboard and see if the data are gone from there (and the Box data is not gone). We can look at that together if you like, i'm not sure exactly how we'll check but there are a couple of potential ways.
@brotzmanmj @jhflorey I will check them right now and let you know if data is deleted or not.
@brotzmanmj @jeannewu i noticedthe Middle Name extracted from HIPAA revocation form
and Middle Name extracted from Data Destruction form
still exist.
consent middlename
and consent suffix
does not exist. I do not know if we have input or not.
@jhflorey Just checked that All the data of this participant are deleted from dev.Connect. But I think as Kelsey told me before, some part of the participant data of this Connect_ID should be kept in the .Connect.participants table "SELECT d_471168198,d_736251808,d_436680969,d_480305327,d_564964481,d_795827569,d_544150384,d_371067537,d_454205108,d_454445267,d_919254129, d_412000022,d_558435199,d_262613359,d_821247024, d_914594314,
d_747006172, d_659990606,
d_299274441.d_299274441,d_919699172,d_141450621,d_576083042,d_431428747,d_121430614,d_523768810,d_639172801,d_175732191,d_150818546,d_624030581,d_285488731,d_596510649,d_866089092,d_990579614,d_131458944,d_372303208,d_777719027,d_620696506,
d_352891568,d_958588520,d_875010152,d_404289911,d_637147033,d_734828170,d_715390138,d_538619788,d_153713899,
d_613641698,d_407743866,d_831041022,d_269050420,d_359404406,d_119449326,d_304438543,d_912301837,d_130371375.d_266600170.d_731498909, d_130371375.d_303552867.d_731498909, d_130371375.d_496823485.d_731498909, d_130371375.d_650465111.d_731498909,
d_130371375.d_266600170.d_787567527,
d_130371375.d_266600170.d_222373868, d_130371375.d_303552867.d_222373868, d_130371375.d_496823485.d_222373868, d_130371375.d_650465111.d_222373868,
d_130371375.d_266600170.d_648936790, d_130371375.d_303552867.d_648936790, d_130371375.d_496823485.d_648936790, d_130371375.d_650465111.d_648936790,
d_130371375.d_266600170.d_297462035, d_130371375.d_303552867.d_297462035, d_130371375.d_496823485.d_297462035, d_130371375.d_650465111.d_297462035,
d_130371375.d_266600170.d_648228701, d_130371375.d_303552867.d_648228701, d_130371375.d_496823485.d_648228701, d_130371375.d_650465111.d_648228701,
d_130371375.d_266600170.d_438636757, d_130371375.d_303552867.d_438636757, d_130371375.d_496823485.d_438636757, d_130371375.d_650465111.d_438636757,d_765336427,d_479278368,d_826240317,d_693626233,d_104278817,d_744604255,
d_268665918,d_592227431,d_399159511,d_231676651,d_996038075,d_506826178,d_524352591.d_524352591, d_524352591.d_902332801,
d_524352591.d_902332801,d_299274441.d_457532784,d_773707518,d_577794331,d_883668444
FROM nih-nci-dceg-connect-dev.Connect.participants
WHERE Connect_ID= 236168927", right? if yes, this part of the data of this connect_ID are not available now?
@brotzmanmj @kmazzilli may you show me how to check the box tables on the data destructions on biospecimen data of this connect_ID? The boxes table is not linked by Connect_ID, but by the box number which contains the biospecimen ID (if my descriptions are correct)?
@jeannewu we dont remove any data in box table as this confirmation https://github.com/episphere/connect/issues/658#issuecomment-1684289146
@jhflorey Just checked that All the data of this participant are deleted from dev.Connect. But I think as Kelsey told me before, some part of the participant data of this Connect_ID should be kept in the .Connect.participants table "SELECT d_471168198,d_736251808,d_436680969,d_480305327,d_564964481,d_795827569,d_544150384,d_371067537,d_454205108,d_454445267,d_919254129, d_412000022,d_558435199,d_262613359,d_821247024, d_914594314, d_747006172, d_659990606, d_299274441.d_299274441,d_919699172,d_141450621,d_576083042,d_431428747,d_121430614,d_523768810,d_639172801,d_175732191,d_150818546,d_624030581,d_285488731,d_596510649,d_866089092,d_990579614,d_131458944,d_372303208,d_777719027,d_620696506, d_352891568,d_958588520,d_875010152,d_404289911,d_637147033,d_734828170,d_715390138,d_538619788,d_153713899, d_613641698,d_407743866,d_831041022,d_269050420,d_359404406,d_119449326,d_304438543,d_912301837,d_130371375.d_266600170.d_731498909, d_130371375.d_303552867.d_731498909, d_130371375.d_496823485.d_731498909, d_130371375.d_650465111.d_731498909, d_130371375.d_266600170.d_787567527, d_130371375.d_266600170.d_222373868, d_130371375.d_303552867.d_222373868, d_130371375.d_496823485.d_222373868, d_130371375.d_650465111.d_222373868, d_130371375.d_266600170.d_648936790, d_130371375.d_303552867.d_648936790, d_130371375.d_496823485.d_648936790, d_130371375.d_650465111.d_648936790, d_130371375.d_266600170.d_297462035, d_130371375.d_303552867.d_297462035, d_130371375.d_496823485.d_297462035, d_130371375.d_650465111.d_297462035, d_130371375.d_266600170.d_648228701, d_130371375.d_303552867.d_648228701, d_130371375.d_496823485.d_648228701, d_130371375.d_650465111.d_648228701, d_130371375.d_266600170.d_438636757, d_130371375.d_303552867.d_438636757, d_130371375.d_496823485.d_438636757, d_130371375.d_650465111.d_438636757,d_765336427,d_479278368,d_826240317,d_693626233,d_104278817,d_744604255, d_268665918,d_592227431,d_399159511,d_231676651,d_996038075,d_506826178,d_524352591.d_524352591, d_524352591.d_902332801, d_524352591.d_902332801,d_299274441.d_457532784,d_773707518,d_577794331,d_883668444 FROM
nih-nci-dceg-connect-dev.Connect.participants
WHERE Connect_ID= 236168927", right? if yes, this part of the data of this connect_ID are not available now?
@jeannewu not sure about your process. this is my stub records list.
[ "query", "pin", "token", "state", "Connect_ID", "471168198", "736251808", "436680969", "480305327", "564964481", "795827569", "544150384", "371067537", "454205108", "454445267", "919254129", "412000022", "558435199", "262613359", "821247024", "914594314", "747006172", "659990606", "299274441", "919699172", "141450621", "576083042", "431428747", "121430614", "523768810", "639172801", "175732191", "150818546", "624030581", "285488731", "596510649", "866089092", "990579614", "131458944", "372303208", "777719027", "620696506", "352891568", "958588520", "875010152", "404289911", "637147033", "734828170", "715390138", "538619788", "153713899", "613641698", "407743866", "831041022", "269050420", "359404406", "119449326", "304438543", "912301837", "130371375", "765336427", "479278368", "826240317", "693626233", "104278817", "744604255", "268665918", "592227431", "399159511", "231676651", "996038075", "506826178", "524352591", "902332801", "457532784", "773707518", "577794331", "883668444", "827220437", "699625233", ]
Or you can refrer to https://nih.app.box.com/file/1255095111396
@jhflorey See what I checked on "SELECT * FROM nih-nci-dceg-connect-dev.Connect.participants
WHERE Connect_ID= 236168927" is shown as "There is no data to display."
@jeannewu 'i noticed the Middle Name extracted from HIPAA revocation form and Middle Name extracted from Data Destruction form still exist.' These are supposed to still exist. They should be on the stub variables list.
Hi all - I checked SMDB and there was one issue for connect_id: 2361618927. I could not download the original HIPAA and consent agreement forms. Instead I received an error message that said "An error has occured generating the pdf please contact support". Otherwise, everything else looked good - I was able to download the data destruction and HIPAA revocation forms and the signatures were correct on SMDB, I was able to access all 4 forms in MyConnect, the notifications are gone from both the MyConnect and the SMDB, and the correct variables were the only ones remaining in SMDB.
@brotzmanmj I think @jhflorey manually updated the Connect.participants table in dev not the FlatConnect.participants_JP table. All the data of this participant in Connect are removed including the ones in participants table. But since the flattened tables in "FlatConnect" are not updated yet, all the data of this participant are still the original ones before @jhflorey manually updated her code to firestorm.
hi @jeannewu, since it's been more than an hour, are you able to confirm that the data in BQ has been updated?
@jhflorey I've just checked that the data of this connect_ID =2361618927 is not in the nih-nci-dceg-connect-dev.Connect.###
. datasets, but they are all in the nih-nci-dceg-connect-dev.FlatConnect. ###.
@kmazzilli. I 've just checked that the data of this connect_ID =2361618927 is not in the nih-nci-dceg-connect-dev.Connect.###. datasets, but they are all in the nih-nci-dceg-connect-dev.FlatConnect. ###.
Hi Jing, can you explain what that means?
@brotzmanmj all in the nih-nci-dceg-connect-dev.FlatConnect. ###. are the one updated this morning at 9:30-10am as scheduled daily. But the data in the nih-nci-dceg-connect-dev.Connect. ###. are synchronized with the ones in Firestore, which might updated hourly. So after @jhflorey manually updated her data destruction code this afternoon in the Firestore, all the data in "nih-nci-dceg-connect-dev.Connect. ###" are updated with the impacts in the firestorm by her code.
@brotzmanmj all the participant connect_ID =2361618927 are all deleted from .connect.tables including the ones which should not be deleted in the participants table as Kelsey told me before. Is this what you want for the data destruction on this participant?
Thanks @jeannewu So you're saying that the data in BQ that get updated hourly have been deleted, but there are lingering data somewhere in BQ that get extracted/flattened once a day at ~9:30am and those we should expect will be deleted tomorrow morning?
@brotzmanmj yes. But how about these informations on the HIPAA, refusal and withdrawal, etc. on this Connect_ID? Should these also be deleted from participants table? d_471168198,d_736251808,d_436680969,d_480305327,d_564964481,d_795827569,d_544150384,d_371067537,d_454205108,d_454445267,d_919254129, d_412000022,d_558435199,d_262613359,d_821247024, d_914594314, d_747006172, d_659990606, d_299274441.d_299274441,d_919699172,d_141450621,d_576083042,d_431428747,d_121430614,d_523768810,d_639172801,d_175732191,d_150818546,d_624030581,d_285488731,d_596510649,d_866089092,d_990579614,d_131458944,d_372303208,d_777719027,d_620696506, d_352891568,d_958588520,d_875010152,d_404289911,d_637147033,d_734828170,d_715390138,d_538619788,d_153713899, d_613641698,d_407743866,d_831041022,d_269050420,d_359404406,d_119449326,d_304438543,d_912301837,d_130371375.d_266600170.d_731498909, d_130371375.d_303552867.d_731498909, d_130371375.d_496823485.d_731498909, d_130371375.d_650465111.d_731498909, d_130371375.d_266600170.d_787567527, d_130371375.d_266600170.d_222373868, d_130371375.d_303552867.d_222373868, d_130371375.d_496823485.d_222373868, d_130371375.d_650465111.d_222373868, d_130371375.d_266600170.d_648936790, d_130371375.d_303552867.d_648936790, d_130371375.d_496823485.d_648936790, d_130371375.d_650465111.d_648936790, d_130371375.d_266600170.d_297462035, d_130371375.d_303552867.d_297462035, d_130371375.d_496823485.d_297462035, d_130371375.d_650465111.d_297462035, d_130371375.d_266600170.d_648228701, d_130371375.d_303552867.d_648228701, d_130371375.d_496823485.d_648228701, d_130371375.d_650465111.d_648228701, d_130371375.d_266600170.d_438636757, d_130371375.d_303552867.d_438636757, d_130371375.d_496823485.d_438636757, d_130371375.d_650465111.d_438636757,d_765336427,d_479278368,d_826240317,d_693626233,d_104278817,d_744604255, d_268665918,d_592227431,d_399159511,d_231676651,d_996038075,d_506826178,d_524352591.d_524352591, d_524352591.d_902332801, d_524352591.d_902332801,d_299274441.d_457532784,d_773707518,d_577794331,d_883668444 FROM nih-nci-dceg-connect-dev.Connect.participants WHERE Connect_ID= 236168927", right?
@jhflorey can you comment on that? are these the stub record variables?
hi everyone - Michelle and I checked out the biospecimen dashboard for connect_id: 2361618927 and everything looked correct - we could see the Box information and under the participation search only the stub variables and a red x under the status
@jhflorey @kmazzilli @brotzmanmj @Davinkjohnson Thank you very much @Davinkjohnson. I had a typo in the Connect_ID which caused such a big confusion. All the data on this Connect_ID have been correctly removed from the participant table. I will double check them tomorrow in the BQ tables again.
Reviewed stub record in stage for connectID 1475895409 and found a few remaining issues. After the stub record was pushed, it did the same thing in the SMDB where I can't view the Participant Summary page for this participant anymore, I'm not sure if that is intentional or not but I don't see it in the SOP. So looking at this record in the SMDB and the CIDs in the array Jessica sent me here's the few remaining issues I found