improvements to dataloader.py

wkvong commented 3 years ago

some ways to improve the dataloader code:

in the __init__ function, I would add code to create a CSV file that contains the relevant properties that you need by reading through the image directories for properties such as (image_filename, shape_category, texture_category, shape_instance, texture_instance) and save that to file. Then, you can just read from this CSV directly in the initialisation step, and then the __getitem__ function can directly index values for each item. Also, one thing I would do here is filter out any trials where the shape and category labels are the same before saving the CSV file since they aren't cue conflict images
also, rather than using os.listdir and filtering out the .DS_Store files, I often prefer to use the functionality from the glob library, e.g. glob.glob('stimuli-shape/style-transfer/*/*.png') which allows me to filter by filetype more easily (and also returns the full directory + filename)

wkvong commented 3 years ago

one more thing, I'd also return the image first (rather than last) from the output of __getitem__ since that is the part that gets passed into the model and its the thing you actually want the dataloader for

alexatartaglini commented 3 years ago

I incorporated all of these suggestions! I'm currently saving the relevant properties for the images in a JSON file (which is on the repo now)

alexatartaglini / developmental-shape-bias

improvements to dataloader.py #5