Closed chen1234520 closed 3 years ago
Hi @chen1234520, I am trying to train a dataset with same config as yours, I am not able to figure out the data format required for training, currently my data is inside
D:/face.evoLVe.PyTorch/data/dataV1/ Inside dataV1 directory the data is as follows: -> id1/ -> 1.jpg -> ... -> id2/ -> 1.jpg -> ... -> ... -> ... -> ... Data is already aligned, resized to 112 using the align script provided in repo. When I run train.py, I am getting file not found error, I saw lot of people are facing the same issue, not being able to get the correct data format.
It would help a lot of people if you can guide how to get the correct dataset format for training. Help would be much appreciated, thank you.
Hi @chen1234520, I am trying to train a dataset with same config as yours, I am not able to figure out the data format required for training, currently my data is inside
D:/face.evoLVe.PyTorch/data/dataV1/ Inside dataV1 directory the data is as follows: -> id1/ -> 1.jpg -> ... -> id2/ -> 1.jpg -> ... -> ... -> ... -> ... Data is already aligned, resized to 112 using the align script provided in repo. When I run train.py, I am getting file not found error, I saw lot of people are facing the same issue, not being able to get the correct data format.
It would help a lot of people if you can guide how to get the correct dataset format for training. Help would be much appreciated, thank you.
我猜测可能是你的数据路径有问题,建议你检查下config.py中的DATA_ROOT参数和train.py中的dataset_train参数得到训练数据路径值是否和训练数据的实际地址一致。
另外,如果遇到loss值无法下降或者为nan,请加载预训练模型或者调低batchsize和初始学习率。
Hi @chen1234520, thanks for the response.
How to generate meta, sizes files?
My DATA_ROOT = 'D:/face.evoLVe.PyTorch/data/dataV1' Actual data: D:/face.evoLVe.PyTorch/data/dataV1/Id1/1.jpg 2.jpg ..... D:/face.evoLVe.PyTorch/data/dataV1/Id2/1.jpg 2.jpg ..... Don't have any files other than .jpg's inside dataV1 directory.
Here's the exact output when I run train.py:
Overall Configurations: {'SEED': 1337, 'DATA_ROOT': 'D:/face.evoLVe.PyTorch/data/dataV1', 'MODEL_ROOT': './model', 'LOG_ROOT': './log', 'BACKBONE_RESUME_ROOT': './model/weights/backbone_ir50_asia.pth', 'HEAD_RESUME_ROOT': './', 'BACKBONE_NAME': 'IR_50', 'HEAD_NAME': 'ArcFace', 'LOSS_NAME': 'Focal', 'INPUT_SIZE': [112, 112], 'RGB_MEAN': [0.5, 0.5, 0.5], 'RGB_STD': [0.5, 0.5, 0.5], 'EMBEDDING_SIZE': 512, 'BATCH_SIZE': 512, 'DROP_LAST': True, 'LR': 0.1, 'NUM_EPOCH': 125, 'WEIGHT_DECAY': 0.0005, 'MOMENTUM': 0.9, 'STAGES': [35, 65, 95], 'DEVICE': device(type='cpu'), 'MULTI_GPU': True, 'GPU_ID': [0, 1], 'PIN_MEMORY': True, 'NUM_WORKERS': 0}
Number of Training Classes: 5749 Traceback (most recent call last): File "train.py", line 84, in lfw, cfp_ff, cfp_fp, agedb, calfw, cplfw, vgg2_fp, lfw_issame, cfp_ff_issame, cfp_fp_issame, agedb_issame, calfw_issame, cplfw_issame, vgg2_fp_issame = get_val_data(DATA_ROOT) File "D:\srikarRnD\dev\face.evoLVe.PyTorch\util\utils.py", line 63, in get_val_data lfw, lfw_issame = get_val_pair(data_path, 'lfw') File "D:\srikarRnD\dev\face.evoLVe.PyTorch\util\utils.py", line 56, in get_val_pair carray = bcolz.carray(rootdir = os.path.join(path, name), mode = 'r') File "bcolz/carray_ext.pyx", line 1067, in bcolz.carray_ext.carray.cinit File "bcolz/carray_ext.pyx", line 1369, in bcolz.carray_ext.carray._read_meta FileNotFoundError: [Errno 2] No such file or directory: 'D:/face.evoLVe.PyTorch/data/dataV1\lfw\meta\sizes'
Hi @chen1234520, thanks for the response.
How to generate meta, sizes files?
My DATA_ROOT = 'D:/face.evoLVe.PyTorch/data/dataV1' Actual data: D:/face.evoLVe.PyTorch/data/dataV1/Id1/1.jpg 2.jpg ..... D:/face.evoLVe.PyTorch/data/dataV1/Id2/1.jpg 2.jpg ..... Don't have any files other than .jpg's inside dataV1 directory.
Here's the exact output when I run train.py:
Overall Configurations: {'SEED': 1337, 'DATA_ROOT': 'D:/face.evoLVe.PyTorch/data/dataV1', 'MODEL_ROOT': './model', 'LOG_ROOT': './log', 'BACKBONE_RESUME_ROOT': './model/weights/backbone_ir50_asia.pth', 'HEAD_RESUME_ROOT': './', 'BACKBONE_NAME': 'IR_50', 'HEAD_NAME': 'ArcFace', 'LOSS_NAME': 'Focal', 'INPUT_SIZE': [112, 112], 'RGB_MEAN': [0.5, 0.5, 0.5], 'RGB_STD': [0.5, 0.5, 0.5], 'EMBEDDING_SIZE': 512, 'BATCH_SIZE': 512, 'DROP_LAST': True, 'LR': 0.1, 'NUM_EPOCH': 125, 'WEIGHT_DECAY': 0.0005, 'MOMENTUM': 0.9, 'STAGES': [35, 65, 95], 'DEVICE': device(type='cpu'), 'MULTI_GPU': True, 'GPU_ID': [0, 1], 'PIN_MEMORY': True, 'NUM_WORKERS': 0}
Number of Training Classes: 5749 Traceback (most recent call last): File "train.py", line 84, in lfw, cfp_ff, cfp_fp, agedb, calfw, cplfw, vgg2_fp, lfw_issame, cfp_ff_issame, cfp_fp_issame, agedb_issame, calfw_issame, cplfw_issame, vgg2_fp_issame = get_val_data(DATA_ROOT) File "D:\srikarRnD\dev\face.evoLVe.PyTorch\util\utils.py", line 63, in get_val_data lfw, lfw_issame = get_val_pair(data_path, 'lfw') File "D:\srikarRnD\dev\face.evoLVe.PyTorch\util\utils.py", line 56, in get_val_pair carray = bcolz.carray(rootdir = os.path.join(path, name), mode = 'r') File "bcolz/carray_ext.pyx", line 1067, in bcolz.carray_ext.carray.cinit File "bcolz/carray_ext.pyx", line 1369, in bcolz.carray_ext.carray._read_meta FileNotFoundError: [Errno 2] No such file or directory: 'D:/face.evoLVe.PyTorch/data/dataV1\lfw\meta\sizes'
This is not a training data error, You have no valdata.The author uses many valdata by default.You need to modify the settings of the valdata if you don't have enough valdata.
Here is my example. 希望能帮助到你. I only use LFW and cplfw.
lfw, cplfw, lfw_issame, cplfw_issame = get_val_data(DATA_ROOT)
def get_val_data(data_path): lfw, lfw_issame = get_val_pair(data_path, 'lfw_align_112/lfw')
# cfp_fp, cfp_fp_issame = get_val_pair(data_path, 'cfp_fp')
# agedb_30, agedb_30_issame = get_val_pair(data_path, 'agedb_30')
# calfw, calfw_issame = get_val_pair(data_path, 'calfw')
# cplfw, cplfw_issame = get_val_pair(data_path, 'cplfw')
cplfw, cplfw_issame = get_val_pair(data_path, 'cplfw_align_112/cplfw')
# vgg2_fp, vgg2_fp_issame = get_val_pair(data_path, 'vgg2_fp')
Hi @chen1234520 how did you resolve this?
Hope you can help me! this is train log.
configurations = { 1: dict( SEED = 1337, # random seed for reproduce results
), }
**Epoch 59/125 Batch 112176/237750 Training Loss 19.4058 (19.3695) Training Prec@1 0.000 (0.000) Training Prec@5 0.000 (0.000)
============================================================ Epoch 59/125 Batch 112195/237750 Training Loss 19.2279 (19.3684) Training Prec@1 0.000 (0.000) Training Prec@5 0.000 (0.000)
============================================================ Epoch 59/125 Batch 112214/237750 Training Loss 19.6988 (19.3685) Training Prec@1 0.000 (0.000) Training Prec@5 0.000 (0.000)
============================================================ Epoch: 59/125 Training Loss 19.4309 (19.3681) Training Prec@1 0.000 (0.000) Training Prec@5 0.000 (0.000)
100%|██████████| 1902/1902 [21:11<00:00, 1.50it/s] 32%|███▏ | 616/1902 [06:52<============================================================ Perform Evaluation on LFW, CFP_FF, CFP_FP, AgeDB, CALFW, CPLFW and VGG2_FP, and Save Checkpoints... Epoch 59/125, Evaluation: LFW Acc: 0.974, CPLFW Acc: 0.8041666666666666 **