Open kcho opened 3 years ago
Hi @kcho and @sbouix , let's continue the discussion here.
By Kevin (edited by Tashrif):
I found a “Data Entry Trigger” function in Redcap. Whenever a record is modified or updated, it sends a POST signal with a bunch of information to a dedicated server. If the major problem in pulling all the data on a daily basis is the REDCap server overloading, do you think implementing “Data Entry Trigger” and connecting to lochness would be a solution (or overkill?)
Suggested workflow:
This would solve the REDCap server problem and we would be able to keep all of the up-to-date REDCap data in lochness.
Okay, here is my modified workflow:
Data Entry Trigger
emits a signalhttps://predict.bwh.harvard.edu/
hosted watchdog (TBD) catches the signalThe last three steps could be done by a cron like bot.
To add the agenda the ability to detect tags for particular variables.
Thanks for this @tashrifbillah
Could you set up a url
under the https://predict.bwh.harvard.edu/
, so it can catch the POST
signal from REDCap
Data Entry Trigger
please?
or if we have any other publically open ports among PNL servers, please let me know. I'll test getting the signal.
The only 2 externally facing servers I know of are hcpep-xnat and our web server. Predict is behind the firewall.
Hi Kevin, do you know of a tutorial that I can go through to learn to upload a file to REDCap? I need to be able to upload, trigger, and listen independently to be able to set up such a thing. Also, where did you get the screenshot? If writing is hard, MS Teams call works for me.
Is this the function I need?
Hi Kevin, do you know of a tutorial that I can go through to learn to upload a file to REDCap? I need to be able to upload, trigger, and listen independently to be able to set up such a thing.
I have not uploaded a file before, but I would suggest to look at the api playground
and try import file
API Method.
API doc is here: https://redcap.partners.org/redcap/api/help
Also, where did you get the screenshot?
Screenshot is from REDCAP - "Project Setup" -> "Enable optional modules and customizations"
Quickly tested to see if REDCap sends the signal to an open server.
are sent to the server. I think it can act as a very useful logging system.
I’ll bring this up in our next meeting, so we can discuss how we can including this.
redcap_url=https%3A%2F%2Fredcap.partners.org%2Fredcap%2F&project_url=https%3A%2F%2Fredcap.partners.org%2Fredcap%2Fredcap_v10.0.30%2Findex.php%3Fpid%3D26709&project_id=26709&username=kc244&record=100111111&instrument=adverse_events_ae&adverse_events_ae_complete=0
2. extensive work is required on the logbook to select and extract the data from the json dump to visualize in the
DPDash
how many fields are there?
The HCP-EP survey I am working with has 915 fields in each of the six instruments a.k.a surveys.
will the fields be changed in any point of the study?
The fields are the same across the six instruments so they should be consistent across the study.
@sbouix @tashrifbillah I thought about the architecture below for what we have discussed yesterday about the REDCap data pulling. I think there were two main problems we discussed yesterday. One is PII and the other is server overloading. Below is my suggestion, please let me know what you think. I'll start working on them soon.
REDCap
pulling architecturelochness.redcap
pulls all data from REDCap
server to PROTECTED/survey/raw/ABCD01.json
Save json - data free from PII
lochness.redcap
(or predict_pii.redcap
or logbook.redcap
)PROTECTED/survey/raw/ABCD01.json
remove all PII fields
GENERAL/survey/raw/ABCD01.json
Save another json - data with the PIIs replaced with pseudo-random strings
lochness.redcap
(or predict_pii.redcap
or logbook.redcap
)PROTECTED/survey/raw/ABCD01.json
and save it in PROTECTED/survey/processed/ABCD01.json
PROTECTED/survey/raw/ABCD01.json
to GENERAL/survey/processed/ABCD01.json
before pulling any data from REDCap
, lochness.redcap
checks for files under PROTECTED/survery/raw
ABCD01.json
already
POST-SIGNAL
from REDCap
Data Entry Trigger
ABCD01
is in the db, execute the downloadABCD01
is not in the db, skip the download repeat PII part
above
in the lochness - lochness transfer, change of the ABCD01.json
should be detected by sha1 / hash / other methods
to only pull the updated data.
What is the distinction between points 2 and 3 under PII Part
?
What is the distinction between points 2 and 3 under
PII Part
?
Sorry - edited a bit
Point 2 is for saving a json in GENERAL
- data that has no PII
Point 3 is for saving a json in GENERAL
- data that has the PII fields replaced to pseudo-random strings
Let's concentrate on REDCap server overloading first.
The PII masking is more complex, some variables can be deleted (e.g. name), others replaced by another variable (e.g. birthdate -> age in years). I am not sure we should have two copies of pretty much the same thing (raw vs processed). Also because I would like to import the anonymized data into MGB REDCap, we should figure out how that will be affected by (2) vs (3). Finally, we may be better off having a table with a list of pii variables as input rather than try to extract the tag from REDCap.
For lochness to lochness transfer. I also think datalad might be useful. Something to discuss with Chris and Mathias on Friday.
Hi @kcho , did you try making a workstation listen to REDCap signal yet? If you haven't, I can try that for my entertainment out of DPDash crisscross ;)
Hi @kcho , did you try making a workstation listen to REDCap signal yet? If you haven't, I can try that for my entertainment out of DPDash crisscross ;)
I haven't yet tried it in the workstation- but I've drafted a commandline tool and a module for listening to the POST signal from the redcap server in the lochness.redcap
https://github.com/PREDICT-DPACC/lochness/blob/devel/kcho/redcap_new_arch/scripts/listen_to_redcap.py
Let's concentrate on REDCap server overloading first.
The model shown below has been uploaded to the devel/kcho/redcap_new_arch
.
https://github.com/PREDICT-DPACC/lochness/compare/master...PREDICT-DPACC:devel/kcho/redcap_new_arch
To do
Data Entry Trigger
(DET) capture server going downData Entry Trigger
listen_to_recap.py
: live server that captures and saves all the POST
signals received from REDCap Data Entry Trigger
timestamp | project_id | redcap_username | record | instrument |
---|---|---|---|---|
1617823322.701979 | 26709 | kc244 | subject0002 | inclusionexclusion |
1617823322.711633 | 26709 | kc244 | subject0001 | inclusionexclusion |
DB
above entered into config.yml
lochness.redcap
checks for any updates in the Data Entry Trigger
database before executing datapulllochness.redcap.get_data_entry_trigger_df
: loads the DET
databaselochness.redcap.check_if_modified
: compares st_mtime
of already saved json
s vs DET
database for any recent updatesIn
check DET-DB
recent update
Do you plan to compare checksum like mediaflux does? Here are nipype ways of computing checksum:
In
check DET-DB recent update
Do you plan to compare checksum like mediaflux does? Here are nipype ways of computing checksum:
Since the Data Entry Trigger Database
(DET-DB
) is a CSV file containing all the REDCap field updates and the timestamp of each POST signal, I compare the last modified date of the already existing json
file vs last update captured in the DET-DB
for each subject (if this subject exists in the DET-DB
)
Hi @kcho , is it expecting an empty csv file?
Hi @kcho , is it expecting an empty csv file?
It's expecting the path of the DET-DB
csv
file. If the already csv
exists, the live capture server
will append new information to the existing csv
file.
Currently, how is it being programmed--listen_to_redcap.py
running sync.py --source redcap
sort of?
Currently, the two python scripts have to be executed separately. Just realized it could be useful to design following your comment.
listen_to_redcap.py running sync.py --source redcap sort of?
Any downside to doing this? Programatically, how would you spin out sync.py
continuously running while also continuously running the listen_to_redcap.py
from the single execution? multiprocess
module?
multiprocess module?
It should be a chanied process--trigger comes first and then pull. We shall discuss more during our Monday brainstorming session.
By the way, do we have access to @sbouix 's presentation on what data reside in what platforms? I am trying to understand which platforms should trigger data entry signals. I understand for PRoNET
, it would be REDCap
. What would that be for PRESCIENT
?
The primary database system for PRESCIENT will be RPMS (Research Project Management System). It is custom built by the Orygen team and doesn't have the extensive documentation or API functionalities of REDCap. We're working to get access to their IT infrastructure to setup a development environment and start developing the Lochness RPMS module.
lochness.redcap
is pulling all available data from the REDCap to a json filelochness.redcap.sync
is re-executed,lochness
pulls the whole data again and then compares existing json before overwriting.Problems
1. Daily pull of the data for all subjects may put too much load on the
REDCap
server2. extensive work is required on the
logbook
to select and extract the data from thejson dump
to visualize in theDPDash
Solutions
lochness.redcap
to pull only specificfields
?REDCap
?)json
-> redownload all filesjson
-> skip