based on discussion with @djarecka on 08/23/2024, we want to improve our (some format)2reproschema converter by remodularizing it. in this current converter:
I made different classes to handle item, activity, and protocol separately
I tried to use as few hard-coded column names as possible in each function. for example, in our previous redcap2reproschema we have SCHEMA_MAP that maps redcap column names to reproschema variables. this time, I reversed the key-value pairs and made a CSV_TO_REPROSCHEMA_MAP where the keys are reproschema variables but values are input csv column names. in this way, we will always use keys in classes and functions but customize the map with different values (input csv column names)
currently we are using a csv file as input for this converter and mostly use it as command line. we should enhance its ability of python module, which should allow users to use dataframe as input and customize dictionaries such as CSV_TO_REPROSCHEMA_MAP, VALUE_TYPE_MAP, INPUT_TYPE_MAP, and ADDITIONAL_NOTES_LIST
I removed csv.DictReader, put the lovely pandas there
I made this converter based on the LORIS format, which a sort of simplified version of the general REDCap version we used to deal with. They are missing some important information (I'll email them soon). but at the same time we can think about how to make the converter more generalized to handle simple and complex cases.
TODOs (popping up when converting the LORIS format):
[ ] for maxValue and minValue can we use other variables' answer as those values? (this comes from date which should be greater than a certain date but smaller than today)
[ ] some variables endswith "_en", which indicates English, some endswith "_es", which indicates Spanish. i haven't specified anything for them yet.
based on discussion with @djarecka on 08/23/2024, we want to improve our (some format)2reproschema converter by remodularizing it. in this current converter:
redcap2reproschema
we have SCHEMA_MAP that maps redcap column names to reproschema variables. this time, I reversed the key-value pairs and made aCSV_TO_REPROSCHEMA_MAP
where the keys are reproschema variables but values are input csv column names. in this way, we will always use keys in classes and functions but customize the map with different values (input csv column names)CSV_TO_REPROSCHEMA_MAP
,VALUE_TYPE_MAP
,INPUT_TYPE_MAP
, andADDITIONAL_NOTES_LIST
csv.DictReader
, put the lovelypandas
thereI made this converter based on the LORIS format, which a sort of simplified version of the general REDCap version we used to deal with. They are missing some important information (I'll email them soon). but at the same time we can think about how to make the converter more generalized to handle simple and complex cases.
TODOs (popping up when converting the LORIS format):
maxValue
andminValue
can we use other variables' answer as those values? (this comes from date which should be greater than a certain date but smaller thantoday
)