CreamyLong / stable-diffusion

Speechless at the original stable-diffusion
https://github.com/CompVis/stable-diffusion/tree/main
77 stars 12 forks source link

大老,Imagenet 数据集能用的,我看代码好象也不太对呢? #12

Closed gg22mm closed 4 months ago

gg22mm commented 4 months ago

大老,Imagenet 数据集太大了,有没有测试数据? 我看代码好象也不太对呢?如下: 运行:python main.py --base configs/autoencoder/autoencoder_kl_8x8x64.yaml --train True e8c4c2c02f634522aa44ec3e0b92020d


代码中写着:self.synsets = [p.split("/")[0] for p in self.relpaths] 但是我看图片是这样的:ILSVRC2012_val_00000001.JPEG

有没有简单的数据测试一下?

标签这几个是什么意思,可以写死的吗 labels = { "relpath": np.array(self.relpaths), #图片转数字 "synsets": np.array([1]), #? "class_label": np.array([2]), #? "human_label": np.array([3]), #? }

gg22mm commented 4 months ago

我发现是这样的格式:

    {
        'relpath': array(['n01440764/ILSVRC2012_val_00000293.JPEG',
           'n01440764/ILSVRC2012_val_00002138.JPEG',
           'n01440764/ILSVRC2012_val_00003014.JPEG', ...,
           'n15075141/ILSVRC2012_val_00046353.JPEG',
           'n15075141/ILSVRC2012_val_00047144.JPEG',
           'n15075141/ILSVRC2012_val_00049174.JPEG'], dtype='<U38'),

        'synsets': array(['n01440764', 'n01440764', 'n01440764', ..., 'n15075141',
           'n15075141', 'n15075141'], dtype='<U9'),

        'class_label': array([  0,   0,   0, ..., 999, 999, 999]),

        'human_label': array(['tench, Tinca tinca', 'tench, Tinca tinca', 'tench, Tinca tinca',
           ..., 'toilet tissue, toilet paper, bathroom tissue',
           'toilet tissue, toilet paper, bathroom tissue',
           'toilet tissue, toilet paper, bathroom tissue'], dtype='<U121')
    }

另外我已经整好的数据集: https://www.kaggle.com/datasets/weililong/sd-imagenet-val