webdataset configuration

XuZhang2 commented 8 months ago

Thanks for your wonderful project. I’ve observed substantial differences in the code related to data processing, particularly the WebDataset. I’d like to understand the key distinctions between these two versions. For example, the following code snippet comes from mindeyev1:

.rename(images="jpg;png", voxels=voxels_key, trial="trial.npy", coco="coco73k.npy", reps="num_uniques.npy")

while this part is from mindeyev2

.rename(behav="behav.npy", past_behav="past_behav.npy", future_behav="future_behav.npy", olds_behav="olds_behav.npy")\
                    .to_tuple(*["behav", "past_behav", "future_behav", "olds_behav"])

Can you tell me why there's such a difference?

PaulScotti commented 6 months ago

Yes we remade the webdatasets for MindEye2 to allow for greater flexibility in the loading process.

I'll update the README soon to explain more in prep for the camera ready paper releasing next week, but briefly the contents of behav.npy follow the following structure:

behavior = {
"cocoidx": int(behav.iloc[jj]['73KID'])-1, #0
"subject": subject, #1
"session": int(behav.iloc[jj]['SESSION']), #2
"run": int(behav.iloc[jj]['RUN']), #3
"trial": int(behav.iloc[jj]['TRIAL']), #4
"global_trial": int(i * (tar + 1)), #5
"time": int(behav.iloc[jj]['TIME']), #6
"isold": int(behav.iloc[jj]['ISOLD']), #7
"iscorrect": iscorrect, #8
"rt": rt, # 0 = no RT #9
"changemind": changemind, #10
"isoldcurrent": isoldcurrent, #11
"iscorrectcurrent": iscorrectcurrent, #12
"total1": total1, #13
"total2": total2, #14
"button": button, #15
"shared1000": is_shared1000, #16 }

PaulScotti commented 6 months ago

Updated the README let me know if theres still problems

MedARC-AI / MindEyeV2

webdataset configuration #23