Cannot reproduce r^2 result on paper

dyhan316 commented 6 months ago

I am trying to replicate the Figure 1 encoder performance plot on paper but am having difficulty.

I followed the tutorial jupyter notebook (using the 33rd layer of OPT-30B model), and tried to reproduce the results for subject S3. I was able to reproduce the Figure 2 results (voxel-wise r values). However, I was not able to reproduce the "Encoding Performance (Avg r^2)" values of Figure 1. I got values in the range of 0.02, not 0.03 as Figure 1 claims it is.

Below is what I got by using the voxel-wise r value (corrs_unnorm) to get r^2 (|r|*r), averaged over each trial. The values are different from the values in Figure 1.

(Below is Figure 1, for reference)

Could you please explain how I can reproduce the results on paper? My current assumptions are that

the values reported in Figure 1 are for the first wheretheressmoke test response data, not an average over all ten sessions...
the values are obtained by averaging across subjects and voxels are these assumptions true? Also, how can I recover the reported values in the paper?

Thank you in advance!

RAntonello commented 6 months ago

Hi @dyhan316, we test using the average over all test sessions to increase SNR in our test data. We use three test stories, wheretheressmoke, onapproachtopluto and fromboyhoodtofatherhood, averaging all test repeats that are available for each of these (10,5, and 5). We mention this in section 2.2 of the paper where we say "These test responses were averaged across repetitions." We also trim an additional 40 TRs of test data from the beginning of each story to account for the onset artifact which we mention in the appendix of the paper. Please inform me if you continue to have trouble reproducing anything and I will be more than happy to provide additional guidance.

dyhan316 commented 6 months ago

@RAntonello Thank you for the response! I have just a few more questions if you don't mind!

It appears that two out of the three test stories are unavailable (link where I got the response data ). (only wherehteressmoke response seems to be available)
Also, I just want to make sure : so the r^2 value was obtained by taking the correlation between the prediction and mean test response and not the mean of correlation (across trials) (i.e. correlation with mean or mean of correlation across trials)?

Thank you again!!

RAntonello commented 6 months ago

You should be able to find the pre-averaged test stories in the full_responses folder of the Box. The separate trials are only used for the noise ceiling analysis in Figure 3. You are correct that it was obtained as the correlation between the prediction and mean test response and not the mean correlation across trials.

dyhan316 commented 6 months ago

@RAntonello Thank you again for your quick response :)

I see! That clears things up a bit. I have one additional question : which stories exactly were used for training/testing?

I may be missing something, but the paper states at 2.2 that , "... each subject listened to roughly 95 different stories, ....".

However, when I load the full_responses file of, say, subject S3, the number of stories is 101.

Could you specify which each stories out of the 101 are used for training? (I first thought that the remaining 98 stories would be used (101-3), but it seems that this is not the case?)

For reference, these are the 101 stories loaded from the full_responses folder!

Thank you!

dict_keys(['itsabox', 'odetostepfather', 'inamoment', 'afearstrippedbare', 'findingmyownrescuer', 'hangtime', 'ifthishaircouldtalk', 'goingthelibertyway', 'golfclubbing', 'thetriangleshirtwaistconnection', 'igrewupinthewestborobaptistchurch', 'tetris', 'becomingindian', 'canplanetearthfeedtenbillionpeoplepart1', 'thetiniestbouquet', 'swimmingwithastronauts', 'lifereimagined', 'forgettingfear', 'stumblinginthedark', 'backsideofthestorm', 'food', 'theclosetthatateeverything', 'escapingfromadirediagnosis', 'notontheusualtour', 'exorcism', 'adventuresinsayingyes', 'thefreedomridersandme', 'cocoonoflove', 'waitingtogo', 'thepostmanalwayscalls', 'googlingstrangersandkentuckybluegrass', 'mayorofthefreaks', 'learninghumanityfromdogs', 'shoppinginchina', 'souls', 'cautioneating', 'comingofageondeathrow', 'breakingupintheageofgoogle', 'gpsformylostidentity', 'marryamanwholoveshismother', 'eyespy', 'treasureisland', 'thesurprisingthingilearnedsailingsoloaroundtheworld', 'theadvancedbeginner', 'goldiethegoldfish', 'life', 'thumbsup', 'seedpotatoesofleningrad', 'theshower', 'adollshouse', 'canplanetearthfeedtenbillionpeoplepart2', 'sloth', 'howtodraw', 'quietfire', 'metsmagic', 'penpal', 'thecurse', 'canadageeseandddp', 'thatthingonmyarm', 'buck', 'thesecrettomarriage', 'wildwomenanddancingqueens', 'againstthewind', 'indianapolis', 'alternateithicatom', 'bluehope', 'kiksuya', 'afatherscover', 'haveyoumethimyet', 'firetestforlove', 'catfishingstrangerstofindmyself', 'christmas1940', 'tildeath', 'lifeanddeathontheoregontrail', 'vixenandtheussr', 'undertheinfluence', 'beneaththemushroomcloud', 'jugglingandjesus', 'superheroesjustforeachother', 'sweetaspie', 'naked', 'singlewomanseekingmanwich', 'avatar', 'whenmothersbullyback', 'myfathershands', 'reachingoutbetweenthebars', 'theinterview', 'stagefright', 'legacy', 'canplanetearthfeedtenbillionpeoplepart3', 'listo', 'gangstersandcookies', 'birthofanation', 'mybackseatviewofagreatromance', 'lawsthatchokecreativity', 'threemonths', 'whyimustspeakoutaboutclimatechange', 'leavingbaghdad', 'wheretheressmoke', 'onapproachtopluto', 'fromboyhoodtofatherhood'])

RAntonello commented 6 months ago

Yes, we removed a small number of stories for incidental reasons from the training set on a per-subject basis. Here are the lists for each subject. UTS02 and UTS03 have the same lists.

UTS01_train_list = ['itsabox', 'odetostepfather', 'inamoment',  'hangtime', 'ifthishaircouldtalk', 'goingthelibertyway', 'golfclubbing', 'thetriangleshirtwaistconnection', 'igrewupinthewestborobaptistchurch', 'tetris', 'becomingindian', 'canplanetearthfeedtenbillionpeoplepart1', 'thetiniestbouquet', 'swimmingwithastronauts', 'lifereimagined', 'forgettingfear', 'stumblinginthedark', 'backsideofthestorm', 'food', 'theclosetthatateeverything', 'notontheusualtour', 'exorcism', 'adventuresinsayingyes', 'thefreedomridersandme', 'cocoonoflove', 'waitingtogo', 'thepostmanalwayscalls', 'googlingstrangersandkentuckybluegrass', 'mayorofthefreaks', 'learninghumanityfromdogs', 'shoppinginchina', 'souls', 'cautioneating', 'comingofageondeathrow', 'breakingupintheageofgoogle', 'gpsformylostidentity', 'eyespy', 'treasureisland', 'thesurprisingthingilearnedsailingsoloaroundtheworld', 'theadvancedbeginner', 'goldiethegoldfish', 'life', 'thumbsup', 'seedpotatoesofleningrad', 'theshower', 'adollshouse', 'canplanetearthfeedtenbillionpeoplepart2', 'sloth', 'howtodraw', 'quietfire', 'metsmagic', 'penpal', 'thecurse', 'canadageeseandddp', 'thatthingonmyarm', 'buck', 'wildwomenanddancingqueens', 'againstthewind', 'indianapolis', 'alternateithicatom', 'bluehope', 'kiksuya', 'afatherscover', 'haveyoumethimyet', 'firetestforlove', 'catfishingstrangerstofindmyself', 'christmas1940', 'tildeath', 'lifeanddeathontheoregontrail', 'vixenandtheussr', 'undertheinfluence', 'beneaththemushroomcloud', 'jugglingandjesus', 'superheroesjustforeachother', 'sweetaspie', 'naked', 'singlewomanseekingmanwich', 'avatar', 'whenmothersbullyback', 'myfathershands', 'reachingoutbetweenthebars', 'theinterview', 'stagefright', 'legacy', 'canplanetearthfeedtenbillionpeoplepart3', 'listo', 'gangstersandcookies', 'birthofanation', 'mybackseatviewofagreatromance', 'lawsthatchokecreativity', 'threemonths', 'whyimustspeakoutaboutclimatechange', 'leavingbaghdad']

UTS_01_test_list = ['wheretheressmoke', 'onapproachtopluto', 'fromboyhoodtofatherhood']

UTS02_03_train_list = ['itsabox', 'odetostepfather', 'inamoment', 'afearstrippedbare', 'findingmyownrescuer', 'hangtime', 'ifthishaircouldtalk', 'goingthelibertyway', 'golfclubbing', 'thetriangleshirtwaistconnection', 'igrewupinthewestborobaptistchurch', 'tetris', 'becomingindian', 'canplanetearthfeedtenbillionpeoplepart1', 'thetiniestbouquet', 'swimmingwithastronauts', 'lifereimagined', 'forgettingfear', 'stumblinginthedark', 'backsideofthestorm', 'food', 'theclosetthatateeverything', 'escapingfromadirediagnosis', 'notontheusualtour', 'exorcism', 'adventuresinsayingyes', 'thefreedomridersandme', 'cocoonoflove', 'waitingtogo', 'thepostmanalwayscalls', 'googlingstrangersandkentuckybluegrass', 'mayorofthefreaks', 'learninghumanityfromdogs', 'shoppinginchina', 'souls', 'cautioneating', 'comingofageondeathrow', 'breakingupintheageofgoogle', 'gpsformylostidentity', 'marryamanwholoveshismother', 'eyespy', 'treasureisland', 'thesurprisingthingilearnedsailingsoloaroundtheworld', 'theadvancedbeginner', 'goldiethegoldfish', 'life', 'thumbsup', 'seedpotatoesofleningrad', 'theshower', 'adollshouse', 'canplanetearthfeedtenbillionpeoplepart2', 'sloth', 'howtodraw', 'quietfire', 'metsmagic', 'penpal', 'thecurse', 'canadageeseandddp', 'thatthingonmyarm', 'buck', 'thesecrettomarriage', 'wildwomenanddancingqueens', 'againstthewind', 'indianapolis', 'alternateithicatom', 'bluehope', 'kiksuya', 'afatherscover', 'haveyoumethimyet', 'firetestforlove', 'catfishingstrangerstofindmyself', 'christmas1940', 'tildeath', 'lifeanddeathontheoregontrail', 'vixenandtheussr', 'undertheinfluence', 'beneaththemushroomcloud', 'jugglingandjesus', 'superheroesjustforeachother', 'sweetaspie', 'naked', 'singlewomanseekingmanwich', 'avatar', 'whenmothersbullyback', 'myfathershands', 'reachingoutbetweenthebars', 'theinterview', 'stagefright', 'legacy', 'canplanetearthfeedtenbillionpeoplepart3', 'listo', 'gangstersandcookies', 'birthofanation', 'mybackseatviewofagreatromance', 'lawsthatchokecreativity', 'threemonths', 'whyimustspeakoutaboutclimatechange', 'leavingbaghdad'][:int(sys.argv[3])]
UTS_02_03_test_list = ['wheretheressmoke', 'onapproachtopluto', 'fromboyhoodtofatherhood']

dyhan316 commented 6 months ago

Thank you! just one more question : in the UTS02_03_train_list, there's a list slicing[:int(sys.argv[3])] at the end. what was the sys.argv[3] value for it?

RAntonello commented 6 months ago

Ah sorry I copied it from the "number of stories" analysis, just use the full list.

dyhan316 commented 6 months ago

Thank you :)

dyhan316 commented 5 months ago

It seems that the TR, text grids file missing for three stories : ['canplanetearthfeedtenbillionpeoplepart3', 'canplanetearthfeedtenbillionpeoplepart2', 'canplanetearthfeedtenbillionpeoplepart1']

The TR, text grid files ref needed for these stories as these are used during training (as you have stated previously)

(I got the TR, text grids files from : https://utexas.app.box.com/v/EncodingModelScalingLaws/folder/230420528915)

RAntonello commented 5 months ago

Please see the added wordseqs.jbl file which contains, among other stories, all the stories from the training and test sets preloaded as DataSequences.

dyhan316 commented 4 months ago

Thank you :)

dyhan316 commented 4 months ago

@RAntonello

Sorry to bother you again. I have a few more questions about reproduction.

It appears that in order to compute the cc_norm and cc_max for a given story, the response data for all the story's trials is needed (in other words, the average of the trials for a given test story is not enough). However, in the box only the "wheretheressmoke" test story has all the trial's response data.
Are the voxelwise correlation coefficient plotted below only plotting cc_norm for "wheretheressmoke" test story only?
When calculating the Encoding Performance (r^2) (Fig1c,f), did you 1. average the average r^2 over the three test stories? or 2. concatenate the predicted response and average response of the three stories, then took the r^2 over them?

Thank you in advance for your response!

HuthLab / encoding-model-scaling-laws

Cannot reproduce r^2 result on paper #1