berkeley-stat159 / project-iota

BSD 3-Clause "New" or "Revised" License
1 stars 6 forks source link

Need help understanding condition text files #53

Closed lizeyuyuz closed 8 years ago

lizeyuyuz commented 8 years ago

Hi Matthew @matthew-brett ,

Our group need some help understanding how to use the condition files in our dataset. Our dataset comes from https://openfmri.org/dataset/ds000115. We know that the experiment design is consisted two task blocks, and yet each block also contains event trials, so we are not sure about how to use convolution to create our design matrix. Thank you in advance for your help and time!!

This is the condition keys: condition_key.txt

And below are the condition files: cond001.txt cond002.txt cond003.txt cond004.txt cond005.txt cond006.txt

matthew-brett commented 8 years ago

Yes, I see the problem.

Looking at sub001, there are 7 condition files.


conds = []
for i in range(1, 8):
    cond_fname = 'cond%03d.txt' % i
    conds.append(np.loadtxt(cond_fname))

Here are the lengths:

In [35]: [len(c) for c in conds]
Out[35]: [2, 42, 42, 2, 2, 42, 0]

cond001.txt looks like this:

25.000000       2.5     1
177.500000      2.5     1

Here's cond004.txt:

132.500000      2.5     1
285.000000      2.5     1

I think these must be the start and end cues for the blocks, because cond005.txt looks like this:

25.000000       110.000000      1
177.500000      110.000000      1

Notice the long block lengths - 110 seconds. So, I think these are the blocks in the design. These all seem to match the key file.

cond002.txt starts like this:

27.500000       0.655357        1
30.000000       0.655357        1
37.500000       0.655357        1
42.500000       0.655357        1

All the files with 42 onsets, cond002.txt, cond003.txt, cond006.txt, have the same onsets:

In [46]: longer_onsets = [cond[:, 0] for cond in conds if len(cond) == 42]

In [47]: np.diff(longer_onsets, axis=0)
Out[47]: 
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.]])

The difference is in the last column, the amplitude. For cond002 the amplitude is always 1, for the others the amplitudes are different, but sum to around 0:

[sum(cond[:, 2]) for cond in conds if len(cond) == 42]
Out[49]: [42.0, 1.19999999990128e-05, 5.9999999999227338e-06]

cond003 appears to only have two amplitudes, but cond006 has many:

In [54]: [np.unique(cond[:, 2]) for cond in conds if len(cond) == 42]
Out[54]: 
[array([ 1.]),
 array([-0.285714,  0.714286]),
 array([-0.239357, -0.215357, -0.194357, -0.154357, -0.131357, -0.122357,
        -0.116357, -0.104357, -0.100357, -0.079357, -0.063357, -0.060357,
        -0.055357, -0.050357, -0.038357, -0.035357, -0.027357, -0.025357,
        -0.002357,  0.001643,  0.003643,  0.005643,  0.015643,  0.021643,
         0.022643,  0.056643,  0.064643,  0.068643,  0.104643,  0.109643,
         0.120643,  0.123643,  0.125643,  0.136643,  0.173643,  0.208643,
         0.217643,  0.229643,  0.274643])]

It seems then that they are classifying the trials (listed in cond002.txt in two different ways, one binary, the other more continuous. So I guess cond003.txt could be the binary target / non-target where the -0.28 value is presumably 'non-target' (there are more of them) and 0.71 is target. They put the values in like this, so the output regressor will be high for targets and low for non targets. The regressor is therefore contrasting targets and non-targets.

The cond006 file is a bit more difficult to guess at. It looks like there are about 36 values here encoding some linear difference between the trials. Do you have any idea what that might be?

Do you have some not-empty cond007 files? Do they give a clue what they are for?

matthew-brett commented 8 years ago

Ah - I see that cond007 for task2 is:

95.000000   0.840190    1
270.000000  0.840190    1

and for task003:

45.000000   0.957683    1
57.500000   0.957683    1
117.500000  0.957683    1
217.500000  0.957683    1
270.000000  0.957683    1

So I guess that task1 is 0-back, task2 is 1-back and task3 is 2-back, and the cond007 file classifies the events as errors. So, for task2:

In [2]: conds = []

In [3]: for i in range(1, 8):
   ...:         cond_fname = 'cond%03d.txt' % i
   ...:         conds.append(np.loadtxt(cond_fname))
   ...:     

In [4]: [len(c) for c in conds]
Out[4]: [2, 40, 40, 2, 2, 40, 2]

The 2 error trials (cond007) aren't included in the other trial vectors. So, the remaining mystery is what the cond006 file is. Are the clues in the paper? Could it relate to something like reaction time?

Jay4869 commented 8 years ago

@lizeyuyuz @matthew-brett Basically, cond001.txt only contains the starting time for two test blocks

25.000000       2.5     1
177.500000      2.5     1

cond004.txt contains the ending time for two best blocks

132.500000      2.5     1
285.000000      2.5     1

In cond005.txt, I can tell the length of two test blocks is 110 second. cond002.txt tells us about the each trial and amplitude (0.655357) which is a binary test, so we combine cond002.txt, cond1.txt and cond4.txt to produce our binary test condition?

27.500000       0.655357        1
30.000000       0.655357        1
37.500000       0.655357        1
42.500000       0.655357        1

If I got wrong in some points, please let me know

matthew-brett commented 8 years ago

The second column is the duration, the third column is the amplitude, so the amplitude is 1 for cond002, and the duration is 0.655 seconds.

I'm guessing the authors used regressors for each of the condition files in their design, so there would have been 7 regressors per task, with each being the neural response given by these onsets and amplitudes, convolved with the HRF. For example, there would be a separate regressor for the start signal the block (lasting 110 seconds) then the end signal, and also for all the event related (length 42, 40, etc) regressors.

Did you get anywhere working out what cond006 was?

Jay4869 commented 8 years ago

@matthew-brett We have not considered cond006 and cond007 yet. As you said, we have to convert 7 cond# to being the neural response (like day 25 you were doing in the lecture?). After we got all neural response, put into linear model to calculate beta which is relationship between response and blood measure

Jay4869 commented 8 years ago

@matthew-brett One more question, cond1 and cond4 are starting and ending time of block test, how can I convert to neural response? For cond5, I tried to convert it by method you taught earlier. The plot looks good, but I am not totally sure what I am doing right? (day13: http://www.jarrodmillman.com/rcsds/lectures/convolution_background.html)

cond5:

25.000000       110.000000      1
177.500000      110.000000      1
matthew-brett commented 8 years ago

cond1 and cond4 are the start signal (instruction) and the end signal (instruction). So, these are also events, and they also have durations, so they can be put into the model in the same way.

Yes, you can convert all these condition files to their neural predictions in the way I showed you in class, and then you can convolve these using the stuff you linked to.

Jay4869 commented 8 years ago

@matthew-brett Thank you for your explanation! I got it

Jay4869 commented 8 years ago

@lizeyuyuz I got most part for cond files, but I still need help on cond006, so I create a particular issue for cond006 and assign to you. I closed this issue for now