Lyken17 / Efficient-PyTorch

My best practice of training large dataset using PyTorch.
1.08k stars 139 forks source link

add attributes like imgs and classes in ImageFolder() #15

Closed flyingmrwang closed 4 years ago

flyingmrwang commented 4 years ago

This seems really helpful to improve IO for pytorch dataset. But I also notice the return value is not totally in the same format as imageFolder() when simply replace in training script. Do you have plan to make it available for output of your ImageFolderLMDB like following?

    classes (list): List of the class names.
    class_to_idx (dict): Dict with items (class_name, class_index).
    imgs (list): List of (image path, class_index) tuples

https://pytorch.org/docs/stable/_modules/torchvision/datasets/folder.html#ImageFolder

Lyken17 commented 4 years ago

You mean add documentation? It is a good suggestion but currently I do not have the bandwidth to do since I am traveling. It would be great if you can submit a PR.

flyingmrwang commented 4 years ago

No, I am just replacing my ImageFolder with ImageFolderLMDB. While my upcomming dataLoader requires sampler, which will analyze the distribution of classes. The thing is original ImageFolder provide output like the the whole list besides iterator, while yours only provide iterator. So I am just wondering if it is possible to add them. I already find my lazy way to make it merged into my pipeline, so just a suggestion to improve if you have time in the future.