AMP-SCZ / utility

Storehouse for all utility scripts
Apache License 2.0
0 stars 4 forks source link

[ENH] Roadmap to combining nda-transform scripts #66

Open tashrifbillah opened 1 year ago

tashrifbillah commented 1 year ago

Case-1: Definitions that can be used as they are: (Variables to share is obtained from _template.csv)

  1. assist01
  2. sri01
  3. ampscz_nsipr01
  4. cgis01
  5. rs01
  6. dim01
  7. iec01
  8. socdem01
  9. ampscz_hcgfb01
  10. ampscz_lapes01
  11. scidvapd01
  12. ampscz_iqa01
  13. dailyd01

Most generic script for the above purpose now is assist01.py.


Case-2: Definitions where AMP-SCZ variables exist in alias and aliases can be split by one-single-prefix: (Variables to share is obtained from _definitions.csv)

  1. calgry01
  2. bprs01
  3. oasis01
  4. tbi01
  5. wasi201
  6. wais_iv_part101 and wisc_v01 have the same variables to share
  7. ampscz_pss01, remember the 77/88 custom codes
  8. cssrs01
  9. clinlabtestsp201
  10. vitas01

NOTE tbi01 is programmed to use ElementName column from _definitions.csv to obtain variables to share. The following note on bprs01 is old and can be ignored as of 6/22/2023.

One script for the above purpose now is bprs01.py. We can use this script and make a yaml file where all the variables are interest can be indexed by prefix

chrcdss:
  -var1,var2,var3,...
chrcssrs:
  -var1,var2,var3,...

Case-3: Definitions where AMP-SCZ variables exist in alias and aliases cannot be split by one-single-prefix:

  1. pds01

Once we replace the first column in definition with aliases, this case reduces to Case-2.


Case-4: Fully custom definition:

  1. ndar_subject01
  2. sofas_baseline (existence of one variable chrsofas_session_type makes it custom)
  3. sofas_followup (_fu string needs to be added to five variables) , coded with sofas_baseline script
  4. gfs01(this form is a nightmare because two prefixes are used in it: chrgfr for scores and chrgfrs for interview_date, missing and missing spec)
    • but we dealt with it by providing chrgfrs as a prefix while hard-coding chrgfr inside the script
  5. pss01 (resembles tbi01 though)
  6. figs01 (resembles tb01 though)
  7. ampscz_psychs01
  8. pmod01: combination of chrpreiq, chrpas
  9. ampscz_pps01 (resembles assist01 though, different because some values are pulled from Nora's CSVs)
  10. scidcls01 (totally csv based)
  11. medhi01 (figs01 like, multi record form)

Use unique script


rawdata program:

  1. image03-->ampscz_sp_sensors01-->ampscz_sp_survey01 (these three are softlinked)
  2. actirec01-->eeg_sub_files01
tashrifbillah commented 1 year ago

This block caused problem for ampscz_lapes01: https://github.com/AMP-SCZ/utility/blob/d5537ce5d5d58532bca3884f65929290305f3239/nda-transform/ampscz_lapes01.py#L92-L93

The input data are floats instead of integers. It is not our fault though as the above block is designed for integers. I believe we designed it that way because some variables had options W.1, M.1, etc. The block is designed to extract trailing integers from such options.

tashrifbillah commented 1 year ago

A different observation:

chrfigs_fam
chrfigs_numsib
chrfigs_numchild

variables are without chrfigs_fam_ string in them.