[Feature] Can we load ImageNet data using LMDB?

Westlake-AI / openmixup

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

https://openmixup.readthedocs.io

Apache License 2.0

629 stars 61 forks source link

[Feature] Can we load ImageNet data using LMDB? #26

Closed wang-tf closed 2 years ago

wang-tf commented 2 years ago

Describe the feature

Current ImageNet data loader can only support pillow, cv2 backends to load images. By some reasone, I need to use LMDB to load dataset which is supported by mmcls. https://github.com/Westlake-AI/openmixup/blob/8966870a05b85ea940a02c4646693ec101ab0575/openmixup/datasets/data_sources/image_list.py#L49-L51

Lupin1998 commented 2 years ago

Hi @wang-tf, thanks for your suggestion. Since I haven't used LMDB to load image datasets, I'd like to have some exciting implementations for reference. But I can't find the support of the LMDB dataset in MMClassification. Could you please show me the source code file and its branch in the MMLab project? You can contact me directly via e-mail or WeChat (Lupin_1998). I am looking forward to having discussions with you.

wang-tf commented 2 years ago

In mmcls, dataset can load using file_client_args which is supported by mmcv:

https://github.com/open-mmlab/mmcv/blob/e417035f5d473b9f85d15ba01267d48d7f30e71e/mmcv/fileio/file_client.py#L790.

So I think LMDB can be used in mmcls.

Lupin1998 commented 2 years ago

Thanks very much for your hint. According to the implementation in MMClassification, I will support the same file_client functions as the CustomDataset in the next commit.

Lupin1998 commented 2 years ago

Hi, @wang-tf. We have updated ImageList in image_list.py according to MMClassification, which is used to load images from list files in the latest commit. The LMDB dataset can be used by modifying file_client_args as follows.

# dataset settings
data_source_cfg = dict(
    type='ImageNet',
    file_client_args=dict(backend='lmdb')),

Since I haven't tested it with LMDB datasets, it can be used as MMClassification. You can contact me if some bugs occur. I close this issue if there is no more question.