ReproNim / reproschema-py

Apache License 2.0
2 stars 8 forks source link

WIP: (re)modularize converter (to_reproschema) #75

Open yibeichan opened 3 months ago

yibeichan commented 3 months ago

based on discussion with @djarecka on 08/23/2024, we want to improve our (some format)2reproschema converter by remodularizing it. in this current converter:

  1. I made different classes to handle item, activity, and protocol separately
  2. I tried to use as few hard-coded column names as possible in each function. for example, in our previous redcap2reproschema we have SCHEMA_MAP that maps redcap column names to reproschema variables. this time, I reversed the key-value pairs and made a CSV_TO_REPROSCHEMA_MAP where the keys are reproschema variables but values are input csv column names. in this way, we will always use keys in classes and functions but customize the map with different values (input csv column names)
  3. currently we are using a csv file as input for this converter and mostly use it as command line. we should enhance its ability of python module, which should allow users to use dataframe as input and customize dictionaries such as CSV_TO_REPROSCHEMA_MAP, VALUE_TYPE_MAP, INPUT_TYPE_MAP, and ADDITIONAL_NOTES_LIST
  4. I removed csv.DictReader, put the lovely pandas there

I made this converter based on the LORIS format, which a sort of simplified version of the general REDCap version we used to deal with. They are missing some important information (I'll email them soon). but at the same time we can think about how to make the converter more generalized to handle simple and complex cases.

TODOs (popping up when converting the LORIS format):

yibeichan commented 3 weeks ago

@yibeichan will add tests and examples @djarecka will try to change the old redcap2reproschema use the new class yibei created here.