davidbau / dissect

Code for the Proceedings of the National Academy of Sciences 2020 article, "Understanding the Role of Individual Units in a Deep Neural Network"
Other
301 stars 50 forks source link

Why this `batch` should be list? #10

Closed a-maiti closed 3 years ago

a-maiti commented 3 years ago

Hi, Thank you for the awesome repo, very easy to use! I want to understand why the batch is expected to be a list in the following line. When inspected, this batch is list of 2 items, first of which is the image tensor. I couldn't understand what the 2nd item was. And also this dataloader is used in 4 places. everywhere else it returns just the images but in this it returns images and the unknown 2nd item. I couldn't exactly pinpoint how is this happening or what is the exact difference in this dataloader. Could you please point me to the relevant code lines? It will help me get unblocked in a research project. https://github.com/davidbau/dissect/blob/9421eaa8672fd051088de6c0225a385064070935/netdissect/tally.py#L115

Thanks!

davidbau commented 3 years ago

What the pytorch dataloader returns will depend on the underlying dataset that you give it. If the dataset returns lists of length N items, then the data loader will return lists of length N batches. A very typical case (e.g., torchvision's ImageFolder) is for the dataset to return lists of 2 items where the first item is the image and the second item is an integer classification label, but it can be other things of course.

If you're using the ParallelImageFolders dataset object that's in this package, then when it's constructed it can be configured via "classification=True" to return classification labels in the constructor, but if this is left False it will just be the image. [Other details - that same dataset class can also be asked via "identification=True" to return the ordinal image number (i.e., i for the ith image in the dataset); or it could be configured with multiple parallel folders of image data so each data item is a list with one of each.]

Is it possible that in the 4 places you're looking you have a different dataset instance? You can see what the dataset is returning by just inspecting, e.g., dataset[0], or looking at the fields inside the dataset to see how it was configured.

On Wed, May 19, 2021 at 2:14 AM Abhishek Maiti @.***> wrote:

Hi, Thank you for the awesome repo, very easy to use! I want to understand why the batch is expected to be a list in the following line. When inspected, this batch is list of 2 items, first of which is the image tensor. I couldn't understand what the 2nd item was. And also this dataloader is used in 4 places. everywhere else it returns just the images but in this it returns images and the unknown 2nd item. I couldn't exactly pinpoint how is this happening or what is the exact difference in this dataloader. Could you please point me to the relevant code lines? It will help me get unblocked in a research project.

https://github.com/davidbau/dissect/blob/9421eaa8672fd051088de6c0225a385064070935/netdissect/tally.py#L115

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/davidbau/dissect/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2MN2DBOZNFCRXCBF4T2ZLTONJMBANCNFSM45D5JJTQ .

a-maiti commented 3 years ago

I am using a custom dataset. Thank you, my data loader was not returning the label and just the image. It is fixed now :)