LearningToTalk / L2TDatabase

Helper functions for working with our lab's MySQL database
GNU General Public License v2.0
0 stars 0 forks source link

reconcile research ids in db and in filenames #35

Closed tjmahr closed 8 years ago

tjmahr commented 8 years ago

@patrickreidy found some discrepancies between the ChildStudy.LongResearchID field and the long IDs used in some filenames. He provided a spreadsheet of the discrepancies

Participant Study StimlogID DatabaseID FieldDiscrepancy Notes
029L TimePoint1 029L34MS2 029L38MS2 Age Age is 34 for: MINP, MP, NWR, RWR, RWL, & VerbalFluency.
651L TimePoint1 651L36MS2 651L35MS2 Age Age is 36 for: EVT, MP, RWR, RWL, & VerbalFluency.
129L TimePoint2 129L45FS3 129L45FS4 Session/cohort "Cohort is 3 for: EVT, MINP, MP, NWR, RWR, RWL, SAILS, & Verbal Fluency. Participants with Age=45, Cohort=3: 080, 104, 125. Participants with Age=45, Cohort=4: 065, 075."
130L TimePoint2 130L45FS3 130L45FS4 Session/cohort "Cohort is 3 for: EVT, MINP, MP, NWR, RWR, RWL, & Verbal Fluency. Participants with Age=45, Cohort=3: 080, 104, 125. Participants with Age=45, Cohort=4: 065, 075."
628L TimePoint2 628L49FS4 628L49MS4 Gender Gender is F for TimePoint1 & TimePoint2 in the database.
608L TimePoint3 608L64FS6 608L64MS6 Gender Gender is F for TimePoint1 & TimePoint2 in the database.

The purpose of the LongResearchID field is to document a piece of data (the ID we assigned to a participant) and to make it easier to link a ChildStudy record to raw data files which use that piece of data in the filename.

The database IDs were created by concatenating columns of hand-entered data from Excel spreadsheets. The hand-entered values and the research id used on test day are not created at the same time or by the same person, I think, so it's not surprising that there are a handful of discrepancies over ~600 child-study pairings.

tjmahr commented 8 years ago

029L is special. I wrote a ReadMe file about them in their raw data folder:

029L was seen for Visit A in April. The child did not return until four months later, so we decided to start all over with this child. I decided to archive the April visit data in a zip file so that it is archived but not readily available. The email history regarding this child is given below. --TJM, 8/30/2013

The child had 34 in their filenames because that was the age at the original, archived visit, but a 38 in the AgeAtvA_1-2 column.

Not every task was redid, so e.g. the child's EVT is at 34 months but the PPVT is at 38 months. This case was part of our motivation to include the date of all tasks in the database and compute age on a task by task basis on the fly.

I created issue #36 to pinpoint more instances like this one...

tjmahr commented 8 years ago

Fixed all discrepancies. In each case, the filename/stimlog version of the ID was chosed and the database record was updated.

image

Updated the original spreadsheet sources for 129L, 130L, 608L, 628L, 651L where the discrepancy originated. Did not 029L because of this child's gap between visits.