Train quality model using MS1M dataset

xellDart commented 2 years ago

HI, I try to train quality model using :

MS1M-ArcFace (85K ids/5.8M images) [5,7] from https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_

But this has all train data in .rec file, I can understand if I can use this dataset whit quality model, I search for a method to decode this, but I cant do this.

def get_dataloader(
    root_dir: str,
    local_rank: int,
    batch_size: int,
    dali = False) -> Iterable:
    if dali and root_dir != "synthetic":
        rec = os.path.join(root_dir, 'train.rec')
        idx = os.path.join(root_dir, 'train.idx')
        return dali_data_iter(
            batch_size=batch_size, rec_file=rec,
            idx_file=idx, num_threads=2, local_rank=local_rank)
    else:
        if root_dir == "synthetic":
            train_set = SyntheticDataset()
        else:
            train_set = MXFaceDataset(root_dir=root_dir, local_rank=local_rank)
        train_sampler = torch.utils.data.distributed.DistributedSampler(train_set, shuffle=True)
        train_loader = DataLoaderX(
            local_rank=local_rank,
            dataset=train_set,
            batch_size=batch_size,
            sampler=train_sampler,
            num_workers=2,
            pin_memory=True,
            drop_last=True,
        )
        return train_loader

I already have train.rec and train.idx

wjxzju commented 2 years ago

You can decode the train.rec and train.idx to jpegs and texts directly. Here is some code snippets you can refer

imgrec = mx.recordio.MXIndexedRecordIO(args.idx_path, args.bin_path, 'r')
s = imgrec.read_idx(0)
header, _ = mx.recordio.unpack(s)
print(header.label)
imgidx = list(range(1, int(header.label[0])))
for i in imgidx:
      img_info = imgrec.read_idx(i)
      header, img = mx.recordio.unpack(img_info)
      label_int = int(header.label)

 you can save img data and label_int here
 ```

xellDart commented 2 years ago

@wjxzju sorry for reopen, but I try to extract quality features for train model using

python3 generate_pseudo_labels/extract_embedding/extract_feats.py

with this config

class Config:
    # dataset
    data_root = ''
    img_list = 'DATA.labelpath'
    eval_model = 'generate_pseudo_labels/extract_embedding/model/SDD_FIQA_checkpoints_r50.pth'
    outfile = '../feats_npy/Embedding_Features.npy'
    # data preprocess
    transform = T.Compose([
        T.Resize((112, 112)),
        T.ToTensor(),
        T.Normalize(mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5]),
    ])
    # network settings
    backbone = 'R_50'               # [MFN, R_50]
    device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
    multi_GPUs = [0]
    embedding_size = 512
    batch_size = 512
    pin_memory = True
    num_workers = 6
config = Config()

Im usign RTX 3090 and 32 gb of ram with this batch size, embedding_size = 512 batch_size = 512

All is god, but on finish step I get:


Number of samples: 5822653
Sample_num = 5822653
100%|████████████████████████████████████▉| 11372/11373 [49:48<00:00,  3.81it/s]

Traceback (most recent call last):
  File "/home/miguel/Documentos/TFace/quality/generate_pseudo_labels/extract_embedding/extract_feats.py", line 96, in <module>
    try: feats[start_idx:end_idx, :] = embeddings
ValueError: could not broadcast input array from shape (189,) into shape (189,512)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/miguel/Documentos/TFace/quality/generate_pseudo_labels/extract_embedding/extract_feats.py", line 98, in <module>
    except: feats[start_idx:, :] = embeddings                              
ValueError: could not broadcast input array from shape (189,) into shape (189,512)

What is the correct value for batch_size and embedding_size for my hardware configuration?

Thanks!

oufuzhao commented 2 years ago

Hi, @xellDart

According to your log, the cause of error seems like the fact that the shape of didn't match to since was initiated as <np.zeros([5822653, 512])>. Can you check whether the shape of is equal to [189, 512] at the last stage? If not, you can reshape it to [189, 512].

Thank you.

Tencent / TFace

Train quality model using MS1M dataset #51