reconcile research ids in db and in filenames

tjmahr commented 8 years ago

@patrickreidy found some discrepancies between the ChildStudy.LongResearchID field and the long IDs used in some filenames. He provided a spreadsheet of the discrepancies

Participant	Study	StimlogID	DatabaseID	FieldDiscrepancy	Notes
029L	TimePoint1	029L34MS2	029L38MS2	Age	Age is 34 for: MINP, MP, NWR, RWR, RWL, & VerbalFluency.
651L	TimePoint1	651L36MS2	651L35MS2	Age	Age is 36 for: EVT, MP, RWR, RWL, & VerbalFluency.
129L	TimePoint2	129L45FS3	129L45FS4	Session/cohort	"Cohort is 3 for: EVT, MINP, MP, NWR, RWR, RWL, SAILS, & Verbal Fluency. Participants with Age=45, Cohort=3: 080, 104, 125. Participants with Age=45, Cohort=4: 065, 075."
130L	TimePoint2	130L45FS3	130L45FS4	Session/cohort	"Cohort is 3 for: EVT, MINP, MP, NWR, RWR, RWL, & Verbal Fluency. Participants with Age=45, Cohort=3: 080, 104, 125. Participants with Age=45, Cohort=4: 065, 075."
628L	TimePoint2	628L49FS4	628L49MS4	Gender	Gender is F for TimePoint1 & TimePoint2 in the database.
608L	TimePoint3	608L64FS6	608L64MS6	Gender	Gender is F for TimePoint1 & TimePoint2 in the database.

The purpose of the LongResearchID field is to document a piece of data (the ID we assigned to a participant) and to make it easier to link a ChildStudy record to raw data files which use that piece of data in the filename.

The database IDs were created by concatenating columns of hand-entered data from Excel spreadsheets. The hand-entered values and the research id used on test day are not created at the same time or by the same person, I think, so it's not surprising that there are a handful of discrepancies over ~600 child-study pairings.

[x] 029L
[x] 651L
[x] 129L
[x] 130L
[x] 628L
[x] 608L
[x] Double check: "Participants with Age=45, Cohort=3: 080, 104, 125. Participants with Age=45, Cohort=4: 065, 075."

tjmahr commented 8 years ago

029L is special. I wrote a ReadMe file about them in their raw data folder:

029L was seen for Visit A in April. The child did not return until four months later, so we decided to start all over with this child. I decided to archive the April visit data in a zip file so that it is archived but not readily available. The email history regarding this child is given below. --TJM, 8/30/2013

The child had 34 in their filenames because that was the age at the original, archived visit, but a 38 in the AgeAtvA_1-2 column.

Not every task was redid, so e.g. the child's EVT is at 34 months but the PPVT is at 38 months. This case was part of our motivation to include the date of all tasks in the database and compute age on a task by task basis on the fly.

[x] change 029L's research ID in the db

I created issue #36 to pinpoint more instances like this one...

tjmahr commented 8 years ago

Fixed all discrepancies. In each case, the filename/stimlog version of the ID was chosed and the database record was updated.

Updated the original spreadsheet sources for 129L, 130L, 608L, 628L, 651L where the discrepancy originated. Did not 029L because of this child's gap between visits.

LearningToTalk / L2TDatabase

reconcile research ids in db and in filenames #35