Oneflow-Inc / CoModels

3 stars 2 forks source link

模型适配进度 #4

Closed ShawnXuan closed 11 months ago

ShawnXuan commented 1 year ago
TODOs: 11.06 剩余任务 领域 功能 基础模型 支持方式 负责人 状态 展开数量
cv classification PVT flowvision Zhou 完成 4
cv classification PoolFormer flowvision Zhou 完成 5
cv classification ConvNeXt flowvision Zhou 完成 18
cv classification LeViT flowvision ke 进行中(infer低) 5
cv classification RegionViT flowvision ke 已完成 8
cv classification VAN flowvision ke 已完成 4
cv classification MobileViT flowvision Zhou 进行中(infer低) 3
领域 功能 基础模型 支持方式 负责人 状态 展开数量
cv classification AlexNet flowvision ke 完成  
cv classification SqueezeNet flowvision ke 完成  
cv classification SqueezeNet 1.1 flowvision ke 完成  
cv classification VGG-11 flowvision ke 完成  
cv classification VGG-11-BN flowvision ke 完成  
cv classification VGG-13 flowvision ke 完成  
cv classification VGG-13-BN flowvision ke 完成  
cv classification VGG-16 flowvision ke 完成  
cv classification VGG-16-BN flowvision ke 完成  
cv classification VGG-19 flowvision ke 完成  
cv classification VGG-19-BN flowvision ke 完成  
cv classification GoogLeNet flowvision zhang 完成  
cv classification Inception_V3 flowvision zhang 完成  
cv classification ResNet-18 flowvision ke 完成  
cv classification ResNet-34 flowvision ke 完成  
cv classification ResNet-50 flowvision ke 完成  
cv classification ResNet-101 flowvision ke 完成  
cv classification ResNet-152 flowvision ke 完成  
cv classification ResNeXt-50 32x4d flowvision ke 完成  
cv classification ResNeXt-101 32x8d flowvision ke 完成  
cv classification ResNeSt-50 flowvision zhang 完成  
cv classification ResNeSt-101 flowvision zhang 完成  
cv classification ResNeSt-200 flowvision zhang 完成  
cv classification ResNeSt-269 flowvision zhang 完成  
cv classification SE-ResNet101 flowvision zhang 完成
cv classification SE-ResNet152 flowvision zhang 完成
cv classification SE-ResNet50 flowvision zhang 完成
cv classification SE-ResNeXt101-32x4d flowvision zhang 完成
cv classification SE-ResNeXt50-32x4d flowvision zhang 完成
cv classification SENet-154 flowvision zhang 完成
cv classification DenseNet-121 flowvision cui 完成  
cv classification DenseNet-161 flowvision cui 完成  
cv classification DenseNet-169 flowvision cui 完成  
cv classification DenseNet-201 flowvision cui 完成  
cv classification ShuffleNet_V2 x0.5 flowvision cui 完成  
cv classification ShuffleNet_V2 x1.0 flowvision cui 完成  
cv classification ShuffleNet_V2 x1.5 flowvision cui 完成  
cv classification ShuffleNet_V2 x2.0 flowvision cui 完成  
cv classification MobileNet_V2 flowvision cui 完成  
cv classification MobileNet_V3 small flowvision cui 完成  
cv classification MobileNet_V3 large flowvision cui 完成  
cv classification MNASNet x0.5 flowvision cui 完成  
cv classification MNASNet x0.75 flowvision cui 完成   
cv classification MNASNet x1.0 flowvision cui 完成  
cv classification MNASNet x1.3 flowvision cui 完成  
cv classification GhostNet flowvision ke 完成  
cv classification EfficientNet flowvision ke 完成 8
cv classification RegNet flowvision ke 完成 15
cv classification ReXNet flowvision ke 完成 10
cv classification ViT flowvision ke 完成 31
cv classification DeiT flowvision ke 完成 22
cv classification PVT flowvision ke 完成 4
cv classification CrossFormer-T flowvision zhang 完成  
cv classification CrossFormer-S flowvision zhang 完成  
cv classification CrossFormer-B flowvision zhang 完成  
cv classification CrossFormer-L flowvision zhang 完成  
cv classification PoolFormer-S12 flowvision zhang 进行中   
cv classification PoolFormer-S24 flowvision zhang 进行中   
cv classification PoolFormer-S36 flowvision zhang 进行中   
cv classification PoolFormer-M36 flowvision zhang 进行中   
cv classification PoolFormer-M48 flowvision zhang 进行中   
cv classification Mlp_Mixer flowvision ke 完成 10
cv classification gMLP flowvision ke 完成 2
cv classification ConvMixer flowvision ke 完成 2
cv classification ConvNeXt flowvision ke 进行中(infer低) 18
cv classification LeViT flowvision ke 进行中(infer低) 5
cv classification RegionViT flowvision ke 完成 8
cv classification VAN flowvision ke 完成 4
cv classification MobileViT flowvision li 进行中(infer低) 3
cv classification CaiT flowvision li 完成 1 6
cv classification DLA flowvision li 完成 1 10
cv classification GENet flowvision li 完成 1 3
cv classification HRNet flowvision li 完成 1 9
cv classification FAN flowvision li 完成 1 12
cv Semantic Segmentation fcn_resnet101_coco flowvision zhou 完成  
cv Semantic Segmentation fcn_resnet50_coco flowvision zhou 完成  
cv Semantic Segmentation deeplabv3_mobilenet_v3_large_coco flowvision zhou 完成  
cv Semantic Segmentation deeplabv3_resnet101_coco flowvision zhou 完成  
cv Semantic Segmentation deeplabv3_resnet50_coco flowvision zhou 完成  
cv Semantic Segmentation lraspp_mobilenet_v3_large_coco flowvision zhou 完成  
cv Object Detection fasterrcnn_mobilenet_v3_large_320_fpn flowvision zhou 完成  
cv Object Detection fasterrcnn_mobilenet_v3_large_fpn flowvision zhou 完成  
cv Object Detection fasterrcnn_resnet50_fpn flowvision zhou 完成  
cv Object Detection maskrcnn_resnet50_fpn flowvision zhou 完成  
cv Object Detection retinanet_resnet50_fpn flowvision zhou 完成  
cv Object Detection ssd300_vgg16 flowvision zhou 完成  
cv Object Detection ssdlite320_mobilenet_v3_large flowvision zhou 完成  
cv Object Detection fcos_resnet50_fpn flowvision zhou 完成  
cv Neural Style Transfer style_transfer.fast_neural_style flowvision zhou 完成  
cv Face Recognition iresnet50 flowvision zhou 完成  
cv Face Recognition iresnet101 flowvision zhou 完成  
cv   VisionTransformer libai li 完成  
nlp   SwinTransformer libai li 完成  
nlp   SwinTransformerV2 libai li 完成  
nlp   ResMLP libai li 完成  
nlp   BERT libai li 完成  
nlp   RoBERTa libai li 完成  
nlp   T5 libai li 完成  
nlp   GPT-2 libai li 完成  
nlp text_classfication Transformer CoModels maolin  完成  
nlp odd_numbers Transformer CoModels maolin 完成  
science Equation inversion-Lorenz system PINNs CoModels zhang     
science Fluid simulation-ldc PINNs CoModels zhang     
zkyseu commented 1 year ago

ssd300_vgg16

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 557a4c3。 运行方式: ``` # run training code cd cv/detection bash ssd300_vgg16/train.sh ``` ``` # run inference code cd cv/detection bash ssd300_vgg16/infer.sh ```
6、训练运行的脚本 ``` python train.py \ --data-path /dataset/mscoco_2017/ \ --dataset coco \ --model ssd300_vgg16 \ --batch-size 32 \ --lr 0.002 \ --weight-decay 0.0005 ```
7、训练日志 训练超参数: ``` (aspect_ratio_group_factor=3, batch_size=32, data_augmentation='hflip', data_path='/home/kunyangzhou/project/dataset/coco/', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=26, evaluation=None, load='', lr=0.002, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='ssd300_vgg16', momentum=0.9, num_classes=91, output_dir='.', pretrained=True, print_freq=20, resume='', start_epoch=0, test_only=True, trainable_backbone_layers=None, weight_decay=0.0005, workers=4, world_size=1) ``` 训练损失: ![W B Chart 2023_10_12 10_19_25](https://github.com/Oneflow-Inc/CoModels/assets/118790294/8942cec3-0c9c-46ec-aa37-3f86495cf169)
8、测试运行的脚本 ``` python train.py \ --data-path /dataset/mscoco_2017/ \ --dataset coco \ --model ssd300_vgg16 \ --batch-size 32 \ --lr 0.002 \ --weight-decay 0.0005 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` (aspect_ratio_group_factor=3, batch_size=32, data_augmentation='hflip', data_path='/home/kunyangzhou/project/dataset/coco/', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=26, evaluation=None, load='', lr=0.002, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='ssd300_vgg16', momentum=0.9, num_classes=91, output_dir='.', pretrained=True, print_freq=20, resume='', start_epoch=0, test_only=True, trainable_backbone_layers=None, weight_decay=0.0005, workers=4, world_size=1) ``` 测试结果: ``` Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.251 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.415 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.262 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.055 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.268 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.435 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.239 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.344 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.365 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.088 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.406 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.602 ```
zkyseu commented 1 year ago

fasterrcnn_resnet50_fpn

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 33ad083。 运行方式: ``` # run training code cd cv/detection bash fasterrcnn_resnet50_fpn/train.sh ``` ``` # run inference code cd cv/detection bash fasterrcnn_resnet50_fpn/infer.sh ```
6、训练运行的脚本 ``` python train.py \ --data-path /data/dataset/coco \ --dataset coco \ --model fasterrcnn_resnet50_fpn \ --batch-size 16 \ --lr 0.02 \ --epochs 12 \ --lr-steps 8 11 ```
7、训练日志 训练超参数: ``` aspect_ratio_group_factor=3, batch_size=4, data_augmentation='hflip', data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=26, evaluation=None, load='', lr=0.02, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='fasterrcnn_resnet50_fpn', momentum=0.9, num_classes=81, output_dir='.', pretrained=True, print_freq=20, resume='', start_epoch=0, test_only=True, trainable_backbone_layers=None, weight_decay=0.0001, workers=4, world_size=1 ``` 训练损失: ![W B Chart 2023_10_13 09_44_52](https://github.com/Oneflow-Inc/CoModels/assets/118790294/0381687d-9f24-45f1-86cb-79c9add463f6)
8、测试运行的脚本 ``` python train.py \ --data-path /data/dataset/coco \ --dataset coco \ --model fasterrcnn_resnet50_fpn \ --batch-size 4 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` aspect_ratio_group_factor=3, batch_size=4, data_augmentation='hflip', data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=26, evaluation=None, load='', lr=0.02, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='fasterrcnn_resnet50_fpn', momentum=0.9, num_classes=91, output_dir='.', pretrained=True, print_freq=20, resume='', start_epoch=0, test_only=True, trainable_backbone_layers=None, weight_decay=0.0001, workers=4, world_size=1 ``` 测试结果: ``` Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.366 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.577 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.396 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.211 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.401 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.474 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.308 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.493 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.518 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.329 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.555 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.648 ```
zkyseu commented 1 year ago

fasterrcnn_mobilenet_v3_large_fpn

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 07632ed。 运行方式: ``` # run training code cd cv/detection bash fasterrcnn_mobilenet_v3_large_fpn/train.sh ``` ``` # run inference code cd cv/detection bash fasterrcnn_mobilenet_v3_large_fpn/infer.sh ```
6、训练运行的脚本 ``` python train.py \ --data-path /dataset/mscoco_2017/ \ --dataset coco \ --model fasterrcnn_mobilenet_v3_large_fpn \ --batch-size 4 \ --lr 0.005 \ --epochs 12 \ --lr-steps 8 11 ```
7、训练日志 训练超参数: ``` 'dataset': 'coco', 'model': 'fasterrcnn_mobilenet_v3_large_fpn', 'device': 'cuda', 'batch_size': 16, 'epochs': 12, 'workers': 4, 'lr': 0.01, 'momentum': 0.9, 'weight_decay': 0.0001, 'lr_scheduler': 'multisteplr', 'lr_step_size': 8, 'lr_steps': [8, 11], 'lr_gamma': 0.1, 'print_freq': 20, 'output_dir': '.', 'resume': '', 'load': '', 'num_classes': 81, 'start_epoch': 0, 'aspect_ratio_group_factor': 3, 'trainable_backbone_layers': None, 'data_augmentation': 'hflip', 'evaluation': None, 'test_only': False, 'pretrained': False, 'world_size': 1, 'dist_url': 'env://', 'distributed': False ``` 训练损失: ![W B Chart 2023_10_16 09_54_04](https://github.com/Oneflow-Inc/CoModels/assets/118790294/71fd6333-60fc-47e9-9773-30d917b2a4e2)
8、测试运行的脚本 ``` python train.py \ --data-path /data/dataset/coco \ --dataset coco \ --model fasterrcnn_mobilenet_v3_large_fpn \ --batch-size 4 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` 'dataset': 'coco', 'model': 'fasterrcnn_mobilenet_v3_large_fpn', 'device': 'cuda', 'batch_size': 16, 'epochs': 12, 'workers': 4, 'lr': 0.01, 'momentum': 0.9, 'weight_decay': 0.0001, 'lr_scheduler': 'multisteplr', 'lr_step_size': 8, 'lr_steps': [8, 11], 'lr_gamma': 0.1, 'print_freq': 20, 'output_dir': '.', 'resume': '', 'load': '', 'num_classes': 91, 'start_epoch': 0, 'aspect_ratio_group_factor': 3, 'trainable_backbone_layers': None, 'data_augmentation': 'hflip', 'evaluation': None, 'test_only': False, 'pretrained': False, 'world_size': 1, 'dist_url': 'env://', 'distributed': False ``` 测试结果: ``` Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.328 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.525 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.343 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.127 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.363 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.501 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.287 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.426 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.444 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.499 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.648 ```
zkyseu commented 1 year ago

fasterrcnn_mobilenet_v3_large_320_fpn

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: aef35a4。 运行方式: ``` # run training code cd cv/detection bash fasterrcnn_mobilenet_v3_large_320_fpn/train.sh ``` ``` # run inference code cd cv/detection bash fasterrcnn_mobilenet_v3_large_320_fpn/infer.sh ```
6、训练运行的脚本 ``` python train.py \ --data-path /dataset/mscoco_2017/ \ --dataset coco \ --model fasterrcnn_mobilenet_v3_large_320_fpn \ --batch-size 32 \ --lr 0.02 \ --epochs 12 \ --lr-steps 8 11 ```
7、训练日志 训练超参数: ``` 'dataset': 'coco', 'model': 'fasterrcnn_mobilenet_v3_large_320_fpn', 'device': 'cuda', 'batch_size': 32, 'epochs': 12, 'workers': 4, 'lr': 0.02, 'momentum': 0.9, 'weight_decay': 0.0001, 'lr_scheduler': 'multisteplr', 'lr_step_size': 8, 'lr_steps': [8, 11], 'lr_gamma': 0.1, 'print_freq': 20, 'output_dir': '.', 'resume': '', 'load': '', 'num_classes': 91, 'start_epoch': 0, 'aspect_ratio_group_factor': 3, 'trainable_backbone_layers': None, 'data_augmentation': 'hflip', 'evaluation': None, 'test_only': False, 'pretrained': False, 'world_size': 1, 'dist_url': 'env://', 'distributed': False ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/9dc82d6c-37fd-4789-a6f2-b0fa4b1775f7)
8、测试运行的脚本 ``` python train.py \ --data-path /data/dataset/coco \ --dataset coco \ --model fasterrcnn_mobilenet_v3_large_320_fpn \ --batch-size 4 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` 'dataset': 'coco', 'model': 'fasterrcnn_mobilenet_v3_large_fpn', 'device': 'cuda', 'batch_size': 16, 'epochs': 12, 'workers': 4, 'lr': 0.01, 'momentum': 0.9, 'weight_decay': 0.0001, 'lr_scheduler': 'multisteplr', 'lr_step_size': 8, 'lr_steps': [8, 11], 'lr_gamma': 0.1, 'print_freq': 20, 'output_dir': '.', 'resume': '', 'load': '', 'num_classes': 91, 'start_epoch': 0, 'aspect_ratio_group_factor': 3, 'trainable_backbone_layers': None, 'data_augmentation': 'hflip', 'evaluation': None, 'test_only': False, 'pretrained': False, 'world_size': 1, 'dist_url': 'env://', 'distributed': False ``` 测试结果: ``` Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.227 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.379 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.232 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.026 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.218 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.443 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.215 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.290 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.294 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.037 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.295 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.569 ```
iwkkk commented 1 year ago

ResNet-50


cd CoModels/cv/classification
bash resnet50/train.sh
训练所用超参数 ```AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 128 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 4 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet50 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet50/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.001 CLIP_GRAD: 5.0 EPOCHS: 50 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: cosine MIN_LR: 1.0e-05 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: adamw START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 20 WARMUP_LR: 1.0e-06 WEIGHT_DECAY: 0.05 ```
Throughput 406.443 (406.443)    Loss 1.0186 (1.0186)    Acc@1 76.074 (76.074)   Acc@5 93.164 (93.164)
INFO  * Acc@1 76.074 Acc@5 93.164 INFO Accuracy of the network on the 49 test images: 76.1% 
INFO Max accuracy: 78.12% ```
cd CoModels/cv/classification
bash resnet50/infer.sh
推理所用超参数 ```AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 128 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet50 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet50/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.000125 CLIP_GRAD: 5.0 EPOCHS: 300 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: cosine MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: adamw START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 20 WARMUP_LR: 1.25e-07 WEIGHT_DECAY: 0.05 ```
zkyseu commented 1 year ago

maskrcnn_resnet50_fpn

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 44dbe1f。 运行方式: ``` # run training code cd cv/detection bash maskrcnn_resnet50_fpn/train.sh ``` ``` # run inference code cd cv/detection bash maskrcnn_resnet50_fpn/infer.sh ```
6、训练运行的脚本 ``` python train.py \ --data-path /dataset/mscoco_2017/ \ --dataset coco \ --model maskrcnn_resnet50_fpn \ --batch-size 4 \ --lr 0.002 \ --epochs 12 \ --lr-steps 8 11 ```
7、训练日志 训练超参数: ``` 'dataset': 'coco', 'model': 'maskrcnn_resnet50_fpn', 'device': 'cuda', 'batch_size': 32, 'epochs': 26, 'workers': 4, 'lr': 0.02, 'momentum': 0.9, 'weight_decay': 0.0001, 'lr_scheduler': 'multisteplr', 'lr_step_size': 8, 'lr_steps': [16, 22], 'lr_gamma': 0.1, 'print_freq': 20, 'output_dir': '.', 'resume': '', 'load': '', 'num_classes': 91, 'start_epoch': 0, 'aspect_ratio_group_factor': 3, 'trainable_backbone_layers': None, 'data_augmentation': 'hflip', 'evaluation': None, 'test_only': True, 'pretrained': True, 'world_size': 1, 'dist_url': 'env://', 'distributed': False ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/d6cf3eb6-0d15-4520-aedc-a7ee77061b4f)
8、测试运行的脚本 ``` python train.py \ --data-path /data/dataset/coco \ --dataset coco \ --model maskrcnn_resnet50_fpn \ --batch-size 4 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` 'dataset': 'coco', 'model': 'maskrcnn_resnet50_fpn', 'device': 'cuda', 'batch_size': 32, 'epochs': 26, 'workers': 4, 'lr': 0.02, 'momentum': 0.9, 'weight_decay': 0.0001, 'lr_scheduler': 'multisteplr', 'lr_step_size': 8, 'lr_steps': [16, 22], 'lr_gamma': 0.1, 'print_freq': 20, 'output_dir': '.', 'resume': '', 'load': '', 'num_classes': 91, 'start_epoch': 0, 'aspect_ratio_group_factor': 3, 'trainable_backbone_layers': None, 'data_augmentation': 'hflip', 'evaluation': None, 'test_only': True, 'pretrained': True, 'world_size': 1, 'dist_url': 'env://', 'distributed': False ``` 测试结果: ``` IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.378 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.592 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.411 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.414 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.494 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.315 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.496 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.520 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.326 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.558 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.666 IoU metric: segm Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.345 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.560 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.367 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.159 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.373 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.507 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.297 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.456 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.476 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.273 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.514 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.638 ```
zkyseu commented 1 year ago

retinanet_resnet50_fpn

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 44dbe1f。 运行方式: ``` # run training code cd cv/detection bash retinanet_resnet50_fpn/train.sh ``` ``` # run inference code cd cv/detection bash retinanet_resnet50_fpn/infer.sh ```
6、训练运行的脚本 ``` python train.py \ --data-path /dataset/mscoco_2017/ \ --dataset coco \ --model retinanet_resnet50_fpn_ \ --batch-size 16 \ --lr 0.01 \ --epochs 12 \ --lr-steps 8 11 \ --weight-decay 0.0001 ```
7、训练日志 训练超参数: ``` 'dataset': 'coco', 'model': 'retinanet_resnet50_fpn_', 'device': 'cuda', 'batch_size': 16, 'epochs': 26, 'workers': 4, 'lr': 0.02, 'momentum': 0.9, 'weight_decay': 0.0001, 'lr_scheduler': 'multisteplr', 'lr_step_size': 8, 'lr_steps': [16, 22], 'lr_gamma': 0.1, 'print_freq': 20, 'output_dir': '.', 'resume': '', 'load': '', 'num_classes': 91, 'start_epoch': 0, 'aspect_ratio_group_factor': 3, 'trainable_backbone_layers': None, 'data_augmentation': 'hflip', 'evaluation': None, 'test_only': True, 'pretrained': True, 'world_size': 1, 'dist_url': 'env://', 'distributed': False ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/2372fff3-5b4e-4169-9fce-5fdf3b93f81c)
8、测试运行的脚本 ``` python train.py \ --data-path /data/dataset/coco \ --dataset coco \ --model retinanet_resnet50_fpn_ \ --batch-size 8 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` 'dataset': 'coco', 'model': 'retinanet_resnet50_fpn_', 'device': 'cuda', 'batch_size': 8, 'epochs': 26, 'workers': 4, 'lr': 0.02, 'momentum': 0.9, 'weight_decay': 0.0001, 'lr_scheduler': 'multisteplr', 'lr_step_size': 8, 'lr_steps': [16, 22], 'lr_gamma': 0.1, 'print_freq': 20, 'output_dir': '.', 'resume': '', 'load': '', 'num_classes': 91, 'start_epoch': 0, 'aspect_ratio_group_factor': 3, 'trainable_backbone_layers': None, 'data_augmentation': 'hflip', 'evaluation': None, 'test_only': True, 'pretrained': True, 'world_size': 1, 'dist_url': 'env://', 'distributed': False ``` 测试结果: ``` IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.363 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.557 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.382 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.490 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.314 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.500 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.540 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.340 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.581 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.696 ```
zkyseu commented 1 year ago

fcos_resnet50_fpn

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 44dbe1f。 运行方式: ``` # run training code cd cv/detection bash fcos_resnet50_fpn/train.sh ``` ``` # run inference code cd cv/detection bash fcos_resnet50_fpn/infer.sh ```
6、训练运行的脚本 ``` python train.py \ --data-path /dataset/mscoco_2017/ \ --dataset coco \ --model fcos_resnet50_fpn_ \ --batch-size 8 \ --lr 0.04 \ --epochs 12 \ --lr-steps 8 11 \ --workers 32 \ ```
7、训练日志 训练超参数: ``` aspect_ratio_group_factor=3, batch_size=4, data_augmentation='hflip', data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=26, evaluation=None, load='', lr=0.02, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='fcos_resnet50_fpn_', momentum=0.9, num_classes=91, output_dir='.', pretrained=True, print_freq=20, resume='', start_epoch=0, test_only=True, trainable_backbone_layers=None, weight_decay=0.0001, workers=4, world_size=1 ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/b370a43d-199f-492e-9625-3d6d86fc96bd)
8、测试运行的脚本 ``` python train.py \ --data-path /data/dataset/coco \ --dataset coco \ --model fcos_resnet50_fpn_ \ --batch-size 4 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` aspect_ratio_group_factor=3, batch_size=4, data_augmentation='hflip', data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=26, evaluation=None, load='', lr=0.02, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='fcos_resnet50_fpn_', momentum=0.9, num_classes=91, output_dir='.', pretrained=True, print_freq=20, resume='', start_epoch=0, test_only=True, trainable_backbone_layers=None, weight_decay=0.0001, workers=4, world_size=1 ``` 测试结果: ``` IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.388 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.581 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.419 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.234 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.424 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.501 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.321 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.535 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.570 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.376 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.615 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.726 ```
zkyseu commented 1 year ago

ssdlite320_mobilenet_v3_large

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 63313f9。 运行方式: ``` # run training code cd cv/detection bash ssdlite320_mobilenet_v3_large/train.sh ``` ``` # run inference code cd cv/detection bash ssdlite320_mobilenet_v3_large/infer.sh ```
6、训练运行的脚本 ``` python train.py \ --data-path /dataset/mscoco_2017/ \ --dataset coco \ --model ssdlite320_mobilenet_v3_large \ --batch-size 32 \ --lr 0.002 \ --weight-decay 0.0005 ```
7、训练日志 训练超参数: ``` aspect_ratio_group_factor=3, batch_size=4, data_augmentation='hflip', data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=26, evaluation=None, load='', lr=0.02, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='fcos_resnet50_fpn_', momentum=0.9, num_classes=91, output_dir='.', pretrained=True, print_freq=20, resume='', start_epoch=0, test_only=True, trainable_backbone_layers=None, weight_decay=0.0001, workers=4, world_size=1 ``` 训练损失: ![W B Chart 2023_10_12 10_19_25](https://github.com/Oneflow-Inc/CoModels/assets/118790294/8942cec3-0c9c-46ec-aa37-3f86495cf169)
8、测试运行的脚本 ``` python train.py \ --data-path /data/dataset/coco \ --dataset coco \ --model ssdlite320_mobilenet_v3_large \ --batch-size 32 \ --lr 0.002 \ --weight-decay 0.0005 ```
9、测试结果 测试超参数: ``` aspect_ratio_group_factor=3, batch_size=4, data_augmentation='hflip', data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=26, evaluation=None, load='', lr=0.02, lr_gamma=0.1, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], model='fcos_resnet50_fpn_', momentum=0.9, num_classes=91, output_dir='.', pretrained=True, print_freq=20, resume='', start_epoch=0, test_only=True, trainable_backbone_layers=None, weight_decay=0.0001, workers=4, world_size=1 ``` 测试结果: ``` IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.213 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.343 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.221 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.202 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.444 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.208 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.307 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.334 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.042 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.345 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.643 ```
iwkkk commented 1 year ago

ResNet-18


cd CoModels/cv/classification
bash resnet18/train.sh
训练所用超参数 ```AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 4 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet18 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet18/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.00025 CLIP_GRAD: 5.0 EPOCHS: 50 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: cosine MIN_LR: 2.5e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: adamw START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 20 WARMUP_LR: 2.5e-07 WEIGHT_DECAY: 0.05 ```
INFO Test: [0/196]  Throughput 345.635 (345.635)    Loss 1.3194 (1.3194)    Acc@1 71.875 (71.875)   Acc@5 87.109 (87.109)
INFO Test: [50/196] Throughput 1548.752 (1449.800)  Loss 1.4135 (1.3025)    Acc@1 68.750 (70.113)   Acc@5 89.453 (89.331)   
INFO Test: [100/196]    Throughput 1471.438 (1460.432)  Loss 1.3624 (1.3093)    Acc@1 69.922 (70.038)   Acc@5 86.719 (89.101)   
Acc@1 69.922 (69.868)   Acc@5 91.016 (89.132)
cd CoModels/cv/classification
bash resnet18/infer.sh
推理所用超参数 ```AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 128 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet18 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet18/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.000125 CLIP_GRAD: 5.0 EPOCHS: 300 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: cosine MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: adamw START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 20 WARMUP_LR: 1.25e-07 WEIGHT_DECAY: 0.05 ```
INFO  * Acc@1 69.533 Acc@5 89.042
INFO Accuracy of the network on the 391 test images: 69.5%
INFO throughput averaged with 30 times
INFO batch_size 128 throughput 4513.727823187475    
zkyseu commented 1 year ago

fcn_resnet50_coco

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 63313f9。 运行方式: ``` # run training code cd cv/detection bash fcn_resnet50_coco/train.sh ``` ``` # run inference code cd cv/detection bash fcn_resnet50_coco/infer.sh ```
6、训练运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model fcn_resnet50_coco \ --aux-loss ```
7、训练日志 训练超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='fcn_resnet50_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/3296079f-1310-4186-8f2d-e95646b781e9)
8、测试运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model fcn_resnet50_coco \ --aux-loss \ --lr 0.12 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='fcn_resnet50_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 测试结果: ``` global correct: 91.4 average row correct: ['94.0', '80.8', '73.9', '72.4', '49.9', '56.9', '81.0', '66.5', '88.6', '43.2', '75.7', '67.6', '80.9', '84.3', '81.2', '90.9', '49.0', '80.4', '61.2', '77.7', '69.0'] IoU: ['90.3', '72.6', '61.2', '61.6', '42.7', '46.9', '73.0', '53.7', '80.3', '33.6', '64.1', '33.9', '61.2', '72.0', '73.6', '80.7', '29.1', '63.5', '48.2', '71.2', '56.3'] mean IoU: 60.5 ```
iwkkk commented 1 year ago

ResNet-34


cd CoModels/cv/classification/resnet34
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet34 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet34/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 64.695 Acc@5 86.494
INFO Accuracy of the network on the 391 test images: 64.7%
INFO Max accuracy: 64.69%
cd CoModels/cv/classification/resnet34
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet34 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet34/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 50 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 73.203 Acc@5 91.354
INFO Accuracy of the network on the 1563 test images: 73.2%
iwkkk commented 1 year ago

ResNet-101


cd CoModels/cv/classification/resnet101
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /ssd/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet101 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet101/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 70.651 Acc@5 90.429
INFO Accuracy of the network on the 391 test images: 70.7%
INFO Max accuracy: 71.07%
cd CoModels/cv/classification/resnet101
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /ssd/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet101 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet101/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 77.259 Acc@5 93.542
INFO Accuracy of the network on the 1563 test images: 77.3%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 845.537374493343
iwkkk commented 1 year ago

ResNet-152


cd CoModels/cv/classification/resnet152
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet152 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet152/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 50 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (41 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/3327b7ed-be35-4793-aee9-08214b93a1da) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/33ea4d62-7744-4327-81fb-e3ac08c015e9)
cd CoModels/cv/classification/resnet152
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /ssd/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnet101 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnet101/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 78.246 Acc@5 93.966
INFO Accuracy of the network on the 1563 test images: 78.2%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 283.2536238449759
iwkkk commented 1 year ago

AlexNet


cd CoModels/cv/classification/alexnet
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /ssd/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: alexnet CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/alexnet/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (90 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/4ab9b599-fcb5-46c1-a585-e639b8f4d712) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/0cbb8c92-96e4-4f3c-b960-ef2f0cb8a556)
cd CoModels/cv/classification/alexnet
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: alexnet CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/alexnet/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 56.159 Acc@5 78.891
INFO Accuracy of the network on the 1563 test images: 56.2%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 6894.08522470448
iwkkk commented 1 year ago

VGG-11


cd CoModels/cv/classification/vgg11
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg11 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg11/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (22 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/034e5ed8-9a4d-43e5-bef6-0b73a2f8586d) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/bcd93c9b-0661-4f00-9f6a-3b6a2a78e3a0)
cd CoModels/cv/classification/vgg11
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 128 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg11 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg11/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 300 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 68.717 Acc@5 88.564
INFO Accuracy of the network on the 391 test images: 68.7%
INFO throughput averaged with 30 times
INFO batch_size 128 throughput 1423.000632773497
iwkkk commented 1 year ago

VGG-11-BN


cd CoModels/cv/classification/vgg11_bn
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg11_bn CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg11_bn/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (13 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/02bfc5a2-96f9-4b62-9bc6-f7978787930b) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/233c4978-6cfb-46d6-bd5b-68d274bf9ab7)
cd CoModels/cv/classification/vgg11_bn
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg11_bn CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg11_bn/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 70.265 Acc@5 89.710
INFO Accuracy of the network on the 1563 test images: 70.3%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 618.8654709457523
iwkkk commented 1 year ago

VGG-13


cd CoModels/cv/classification/vgg13
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg13 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg13/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (26 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/24822780-9d0d-4245-b42a-2a972efaffac) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/49e7716c-fcaf-4124-98d1-ad5227513a54)
cd CoModels/cv/classification/vgg13
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg13 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg13/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 69.604 Acc@5 89.233
INFO Accuracy of the network on the 1563 test images: 69.6%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 441.14096897959365
iwkkk commented 1 year ago

VGG-13-BN


cd CoModels/cv/classification/vgg13_bn
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg13_bn CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg13_bn/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (12 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/a0dd2784-ddcd-4136-80b7-a647f9b55c4b) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/bae784fa-40c5-44e8-8cb2-a6ad7bce29cc)
cd CoModels/cv/classification/vgg13_bn
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg13_bn CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg13_bn/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 71.397 Acc@5 90.317
INFO Accuracy of the network on the 1563 test images: 71.4%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 396.0371458561849
iwkkk commented 1 year ago

VGG-16


cd CoModels/cv/classification/vgg16
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg16 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg16/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (14 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/480c88bd-39fc-43f0-ae38-1e84c3d03adf) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/a3790a43-53ab-46ee-ab42-c0627505ff9f)
cd CoModels/cv/classification/vgg16
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg16 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg16/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 71.385 Acc@5 90.325
INFO Accuracy of the network on the 1563 test images: 71.4%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 352.50112210291934
iwkkk commented 1 year ago

VGG-19


cd CoModels/cv/classification/vgg19
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg19 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg19/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (11 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/de3a9063-9a8a-4f72-bb9d-647032d7f7b6) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/3e41418c-0ecf-415c-b39d-e3793df2344c)
cd CoModels/cv/classification/vgg19
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg19 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg19/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 72.238 Acc@5 90.764
INFO Accuracy of the network on the 1563 test images: 72.2%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 293.97019728664696
Drlifei commented 1 year ago

T5

1、机器:of27@192.168.40.27。

2、数据集:/data/dataset/bert_data

3、oneflow version commit:[57f632741ab0e9ee81c5e2d49098e292dcd7e705] 。

4、libai

5、Training ``` # run training code cd nlp/libai/T5 bash train.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file T5/t5_pretrain_config.py \ ```
6、Inference ``` # run Inference code cd nlp/libai/T5 bash infer.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file T5/t5_infer_config.py \ --eval-only ```
7、训练日志 train_micro_batch_size = 128 4卡 训练损失:![image.png](https://user-images.githubusercontent.com/144590379/278379785-deffbfde-13f6-46b2-b35c-91e78a361a2d.png)
8、测试日志 测试结果: ``` copypaste: masked_lm_loss_PPL=13.282667449676696 ```
Drlifei commented 1 year ago

Bert

1、机器:a100@60.171.194.72。

2、数据集:/data/dataset/bert_data

3、oneflow version commit:[24ed4d6] 。

4、libai

5、Training ``` # run training code cd nlp/libai/Bert bash train.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file bert_pretrain_config.py \ ```
6、Inference ``` # run Inference code cd nlp/libai/Bert bash infer.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file bert_infer_config.py \ --eval-only ```
7、训练日志 train_micro_batch_size = 64 8卡 训练损失:![image.png](https://user-images.githubusercontent.com/144590379/278522888-e1f235fb-a986-441a-87b8-c52876fe043d.png)
8、测试日志 测试结果: ![image.png](https://user-images.githubusercontent.com/144590379/278523634-9769e4a7-b36d-489a-adaa-de141020ee0f.png)
Drlifei commented 1 year ago

gpt-2

1、机器:a100@60.171.194.72。

2、数据集:/data/dataset/gpt2_data

3、oneflow version commit:[630bb39] 。

4、libai

5、Training ``` # run training code cd nlp/libai/gpt2 bash train.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file gpt2_pretrain_config.py \ ```
6、Inference ``` # run Inference code cd nlp/libai/gpt2 bash infer.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file gpt2_infer_config.py \ --eval-only ```
7、训练日志 train_micro_batch_size = 4 8卡 训练损失:![image](https://github.com/Drlifei/qaq/assets/144590379/64c29c4f-5ac8-4373-8a0c-6e4584b1e186)
8、测试日志 测试结果: ``` copypaste: lm_loss_PPL=35.66374506846473 ```
Drlifei commented 1 year ago

RoBERTa

1、机器:a100@60.171.194.72。

2、数据集:/data/dataset/robert_data

3、oneflow version commit:[e4db023] 。

4、libai

5、Training ``` # run training code cd nlp/libai/RoBERTa bash train.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file roberta_pretrain_config.py \ ```
6、Inference ``` # run Inference code cd nlp/libai/RoBERTa bash infer.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file roberta_infer_config.py \ --eval-only ```
7、训练日志 train_micro_batch_size = 2 8卡 训练损失:![image](https://github.com/Drlifei/qaq/assets/144590379/9a45104c-933d-4bc8-8556-d061cce9571e)
8、测试日志 测试结果: ``` copypaste: lm_loss_PPL=25.9010300271722 ```
kokuro-asahi commented 1 year ago

Densenet_201


cd CoModels/cv/classification/densenet201
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: densenet201 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/densenet201 PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (14 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/59110141/0cf4185d-fc31-45b2-ac0d-384706f20161)
cd CoModels/cv/classification/densenet201
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg16 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/densenet201/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
wandb: Run summary:
wandb: val_acc1 77.30496
wandb: val_acc5 93.48808
wandb: val_loss 0.91155
kokuro-asahi commented 1 year ago

Densenet_161


cd CoModels/cv/classification/densenet161
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: densenet161 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/densenet161 PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (14 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/59110141/d5623363-c29f-4232-8599-533884c730ab)
cd CoModels/cv/classification/densenet161
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: densenet161 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/densenet161/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
wandb: Run summary:
wandb: val_acc1 77.37347
wandb: val_acc5 93.65128
wandb: val_loss 0.93307
kokuro-asahi commented 1 year ago

Densenet_121


cd CoModels/cv/classification/densenet121
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: densenet121 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/densenet121 PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (14 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/59110141/24c27085-7af7-42d7-88f8-88592f297a03)
cd CoModels/cv/classification/densenet121
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: densenet121 CHECKPOINTS: null DROP_PATH_RATE: 0 .1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/densenet121/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
wandb: Run summary:
wandb: val_acc1 74.74815
wandb: val_acc5 92.17239
wandb: val_loss 1.01171
kokuro-asahi commented 1 year ago

Densenet_169


cd CoModels/cv/classification/densenet169
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: densenet169 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/densenet169 PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (14 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/59110141/09e09dcb-035e-46e6-9ac6-429d39bddaad)
cd CoModels/cv/classification/densenet169
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: densenet169 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/densenet169/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.01 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
wandb: Run summary:
wandb: val_acc1 75.88854
wandb: val_acc5 93.02264
wandb: val_loss 0.98375
iwkkk commented 1 year ago

Vgg16_bn


cd CoModels/cv/classification/vgg16_bn
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg16_bn CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg16_bn/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (8 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/de3ea793-c003-49d7-bcfa-3b4d72565347) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/0571ccfb-7a21-4e38-affa-3fe61495988e)
cd CoModels/cv/classification/vgg16_bn
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_CLASSES: 1000 NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg16_bn CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg16_bn/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 73.064 Acc@5 91.377
INFO Accuracy of the network on the 1563 test images: 73.1%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 527.2008533607626
iwkkk commented 1 year ago

Vgg19_bn


cd CoModels/cv/classification/vgg19_bn
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg19_bn CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg19_bn/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (12 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/8191fb88-a69c-4aec-8551-f8de1fa68d34) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/75a6ec18-8a54-413a-83ef-7b5a7c2e490e)
cd CoModels/cv/classification/vgg19_bn
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_CLASSES: 1000 NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: vgg19_bn CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/vgg19_bn/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 74.025 Acc@5 91.731
INFO Accuracy of the network on the 1563 test images: 74.0%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 297.1238754357942
Drlifei commented 1 year ago

ResMLP

1、机器:27@192.168.1.27。

2、数据集:imagenet。

3、oneflow version commit: [04c8bcc]。

4、libai

5、Training ``` # run training code cd nlp/libai/ResMLP bash train.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file resmlp_imagenet.py \ ```
6、Inference ``` # run Inference code cd nlp/libai/ResMLP bash infer.sh ``` ``` # code export ONEFLOW_FUSE_OPTIMIZER_UPDATE_CAST=true python3 -m oneflow.distributed.launch \ --nproc_per_node 1 \ --nnodes 1 \ --node_rank 0 \ --master_addr 127.0.0.1\ --master_port 12345 \ train_net.py \ --config-file resmlp_infer_config.py \ --eval-only ```
7、训练日志 train_micro_batch_size = 256 4卡 训练损失: ![image](https://github.com/Drlifei/qaq/assets/144590379/8c09ccd3-de45-4600-ba1c-685b6155a4d7)
8、测试日志 测试结果: ``` copypaste: Acc@1=22.334 copypaste: Acc@5=44.635999999999996 ```
akeeei commented 1 year ago

Inception_V3


cd CoModels/cv/classification/inception_v3
bash train.sh
训练所用超参数
DATA:
  BATCH_SIZE: 32
  DATASET: cifar100
  DATA_PATH: /data/dataset/cifar100/extract
  IMG_SIZE: 224
  NUM_WORKERS: 4
  NUM_CLASSES: 100
TRAIN:
  START_EPOCH: 0
  EPOCHS: 50
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY:  1e-4
  BASE_LR: 0.001
  WARMUP_LR: 5e-7

Inference ``` cd CoModels/cv/classification/inception_v3 bash infer.sh ```
训练过程 (50 epochs) * 训练日志 : ![image](https://github.com/Oneflow-Inc/CoModels/assets/98879022/bce1f69c-7b39-429e-80ba-d9feaef63f72) * 训练结果 : ``` INFO * Acc@1 58.747 Acc@5 85.677 INFO Accuracy of the network on the 79 test images: 58.7% INFO Max accuracy: 59.04% INFO Training time 1:30:17 ```
推理结果 ``` INFO * Acc@1 70.132 Acc@5 89.728 INFO Accuracy of the network on the 1563 test images: 70.1% INFO throughput averaged with 30 times INFO batch_size 32 throughput 1684.5348415445303 ```
akeeei commented 1 year ago

GoogLeNet


cd CoModels/cv/classification/googlenet
bash train.sh
训练所用超参数
DATA:
  BATCH_SIZE: 32
  DATASET: cifar100
  DATA_PATH: /data/dataset/cifar100/extract
  IMG_SIZE: 224
  NUM_WORKERS: 4
  NUM_CLASSES: 100

TRAIN:
  START_EPOCH: 0
  EPOCHS: 50
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY:  1e-4
  BASE_LR: 0.001
  WARMUP_LR: 5e-7

Inference ``` cd CoModels/cv/classification/googlenet bash infer.sh ```
训练过程 (50 epochs) * 训练日志 : ![image](https://github.com/Oneflow-Inc/CoModels/assets/98879022/43d67486-4228-4e97-ba86-ced223a5167d) * 训练结果 : ``` INFO * Acc@1 74.381 Acc@5 92.179 INFO Accuracy of the network on the 391 test images: 74.4% INFO Max accuracy: 74.38% INFO Training time 13:50:55 ```
推理结果 ``` INFO * Acc@1 77.647 Acc@5 93.569 INFO Accuracy of the network on the 1563 test images: 77.6% INFO throughput averaged with 30 times INFO batch_size 32 throughput 866.1724815102194 ```
akeeei commented 1 year ago

ResNeSt-50


cd CoModels/cv/classification/resnest50
bash train.sh
训练所用超参数
DATA:
  BATCH_SIZE: 32
  DATASET: imagenet
  DATA_PATH: /data/dataset/ImageNet/extract
  IMG_SIZE: 224
  NUM_WORKERS: 8

TRAIN:
  START_EPOCH: 0
  EPOCHS: 20
  WARMUP_EPOCHS: 0
  WEIGHT_DECAY: 1e-4
  BASE_LR: 1e-3
  WARMUP_LR: 5e-7

Inference ``` cd CoModels/cv/classification/resnest50 bash infer.sh ```
训练过程 (2 epochs) * 训练日志 : ![image](https://github.com/Oneflow-Inc/CoModels/assets/98879022/1f731309-3856-4cf2-b7c2-201a569ca867) * 训练结果 : ``` INFO * Acc@1 74.381 Acc@5 92.169 INFO Accuracy of the network on the 782 test images: 80.2% INFO Max accuracy: 80.15% INFO Training time 2:41:35 ```
推理结果 ``` INFO * Acc@1 81.627 Acc@5 95.662 INFO Accuracy of the network on the 1563 test images: 81.6% INFO throughput averaged with 30 times INFO batch_size 32 throughput 206.34531237020244 ```
iwkkk commented 1 year ago

SqueezeNet


cd CoModels/cv/classification/squeezenet1_0
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_CLASSES: 1000 NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: squeezenet1_0 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/squeezenet1_0/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (8 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/55f22363-dd8e-4fbe-abd8-be8fcca61b01)
cd CoModels/cv/classification/squeezenet1_0
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_CLASSES: 1000 NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: squeezenet1_0 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/squeezenet1_0/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 57.836 Acc@5 80.309
INFO Accuracy of the network on the 1563 test images: 57.8%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 3930.674481421572
iwkkk commented 1 year ago

ResNeXt-50 32x4d


cd CoModels/cv/classification/resnext50_32x4d
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 4 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnext50_32x4d CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnext50_32x4d/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (11 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/88690322-66da-4ebb-86bb-9b3b7620204f)
cd CoModels/cv/classification/resnext50_32x4d
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_CLASSES: 1000 NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnext50_32x4d CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnext50_32x4d/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 77.490 Acc@5 93.575
INFO Accuracy of the network on the 1563 test images: 77.5%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 353.72756770888606
iwkkk commented 1 year ago

SqueezeNet 1.1


cd CoModels/cv/classification/squeezenet1_1
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_CLASSES: 1000 NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: squeezenet1_1 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/squeezenet1_1/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (8 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/00acea67-912b-4816-a4f5-2047acdb9ae3)
cd CoModels/cv/classification/squeezenet1_1
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_CLASSES: 1000 NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: squeezenet1_1 CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/squeezenet1_1/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 57.878 Acc@5 80.412
INFO Accuracy of the network on the 1563 test images: 57.9%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 6722.886752853844
iwkkk commented 1 year ago

ResNeXt-101 32x8d


cd CoModels/cv/classification/resnext101_32x8d
bash train.sh
训练所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_WORKERS: 4 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnext101_32x8d CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnext101_32x8d/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 1.25e-06 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
训练过程 (7 epochs) ![image](https://github.com/Oneflow-Inc/CoModels/assets/77448166/47acb816-9d18-4427-957e-4ec6a302b59f)
cd CoModels/cv/classification/resnext101_32x8d
bash  infer.sh
推理所用超参数 ``` AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 0.0 CUTMIX_MINMAX: null MIXUP: 0.0 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 32 CACHE_MODE: part DATASET: imagenet DATA_PATH: /data/dataset/ImageNet/extract IMG_SIZE: 224 INTERPOLATION: bicubic NUM_CLASSES: 1000 NUM_WORKERS: 8 PIN_MEMORY: true SYNTHETIC_DATA: false ZIP_MODE: false EVAL_MODE: true LOCAL_RANK: 0 MODEL: ARCH: resnext101_32x8d CHECKPOINTS: null DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 NUM_CLASSES: 1000 PRETRAINED: true RESUME: '' OUTPUT: output/resnext101_32x8d/default PRINT_FREQ: 50 SAVE_FREQ: 1 SEED: 42 TAG: default TEST: CROP: true SEQUENTIAL: false THROUGHPUT_MODE: true TRAIN: ACCUMULATION_STEPS: 0 AUTO_RESUME: false BASE_LR: 0.1 CLIP_GRAD: 5.0 EPOCHS: 90 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 MILESTONES: - 150 - 225 NAME: step MIN_LR: 3.125e-07 OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: sgd START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 0 WARMUP_LR: 5.0e-07 WEIGHT_DECAY: 0.0001 ```
INFO  * Acc@1 79.060 Acc@5 94.437
INFO Accuracy of the network on the 1563 test images: 79.1%
INFO throughput averaged with 30 times
INFO batch_size 32 throughput 209.79367594690888
akeeei commented 1 year ago

ResNeSt-200


cd CoModels/cv/classification/resnest200
bash train.sh
训练所用超参数
DATA:
  BATCH_SIZE: 32
  DATASET: imagenet
  DATA_PATH: /data/dataset/ImageNet/extract
  IMG_SIZE: 320
  NUM_WORKERS: 8

TRAIN:
  START_EPOCH: 0
  EPOCHS: 2
  WARMUP_EPOCHS: 0
  WEIGHT_DECAY: 1e-4
  BASE_LR: 1e-3
  WARMUP_LR: 5e-7

Inference ``` cd CoModels/cv/classification/resnest200 bash infer.sh ```
训练过程 (2 epochs) * 训练日志 : ![image](https://github.com/Oneflow-Inc/CoModels/assets/98879022/f18e312d-c862-44e6-b316-cf7ba0210bb6) * 训练结果 : ``` INFO * Acc@1 82.278 Acc@5 96.269 INFO Accuracy of the network on the 1563 test images: 82.3% INFO Max accuracy: 82.46% INFO Training time 4:50:29 ```
推理结果 ``` INFO * Acc@1 83.132 Acc@5 96.583 INFO Accuracy of the network on the 1563 test images: 83.1% INFO throughput averaged with 30 times INFO batch_size 32 throughput 100.20955714619772 ```
akeeei commented 1 year ago

ResNeSt-269


cd CoModels/cv/classification/resnest269
bash train.sh
训练所用超参数
DATA:
  BATCH_SIZE: 32
  DATASET: imagenet
  DATA_PATH: /data/dataset/ImageNet/extract
  IMG_SIZE: 416
  NUM_WORKERS: 8

TRAIN:
  START_EPOCH: 0
  EPOCHS: 2
  WARMUP_EPOCHS: 0
  WEIGHT_DECAY: 1e-4
  BASE_LR: 1e-3
  WARMUP_LR: 5e-7

Inference ``` cd CoModels/cv/classification/resnest269 bash infer.sh ```
训练过程 (2 epochs) * 训练日志 : ![image](https://github.com/Oneflow-Inc/CoModels/assets/98879022/07959031-574f-4c69-8182-a7c5c6fb3bf7) * 训练结果 : ``` INFO * Acc@1 81.490 Acc@5 95.876 INFO Accuracy of the network on the 3125 test images: 81.5% INFO Max accuracy: 81.49% INFO Training time 11:25:46 ```
推理结果 ``` INFO * Acc@1 84.280 Acc@5 96.887 INFO Accuracy of the network on the 1563 test images: 84.3% INFO throughput averaged with 30 times INFO batch_size 32 throughput 22.534863856275436 ```
zkyseu commented 1 year ago

fcn_resnet101_coco

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: fd6b6e4。 运行方式: ``` # run training code cd cv/detection bash fcn_resnet101_coco/train.sh ``` ``` # run inference code cd cv/detection bash fcn_resnet101_coco/infer.sh ```
6、训练运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model fcn_resnet101_coco \ --aux-loss ```
7、训练日志 训练超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='fcn_resnet101_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/02dc8822-49e4-4a28-a963-c6eca6979de8)
8、测试运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model fcn_resnet101_coco \ --aux-loss \ --lr 0.12 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='fcn_resnet50_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 测试结果: ``` global correct: 91.9 average row correct: ['94.2', '83.8', '78.2', '76.0', '58.5', '60.2', '78.1', '70.5', '93.0', '47.8', '79.7', '71.3', '80.0', '85.8', '81.0', '91.3', '49.1', '82.5', '67.0', '80.6', '68.5'] IoU: ['90.8', '76.1', '64.2', '65.0', '47.8', '49.8', '73.0', '57.9', '83.5', '37.2', '69.0', '35.3', '66.5', '75.8', '74.6', '82.2', '32.9', '69.1', '50.4', '75.1', '61.0'] mean IoU: 63.7 ```
zkyseu commented 1 year ago

deeplabv3_mobilenet_v3_large_coco

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 84a3899。 运行方式: ``` # run training code cd cv/detection bash deeplabv3_mobilenet_v3_large_coco/train.sh ``` ``` # run inference code cd cv/detection bash deeplabv3_mobilenet_v3_large_coco/infer.sh ```
6、训练运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model deeplabv3_mobilenet_v3_large_coco \ --aux-loss ```
7、训练日志 训练超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='deeplabv3_mobilenet_v3_large_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/02dc8822-49e4-4a28-a963-c6eca6979de8)
8、测试运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model deeplabv3_mobilenet_v3_large_coco \ --aux-loss \ --lr 0.12 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='deeplabv3_mobilenet_v3_large_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 测试结果: ``` global correct: 91.2 average row correct: ['93.7', '84.9', '73.6', '74.6', '63.6', '50.6', '80.7', '65.1', '91.3', '42.2', '80.4', '70.6', '82.4', '81.8', '83.7', '88.5', '52.6', '87.9', '65.9', '88.3', '63.3'] IoU: ['90.1', '69.7', '58.2', '61.3', '49.7', '37.6', '72.7', '52.7', '79.1', '32.2', '64.6', '36.2', '66.7', '67.4', '70.1', '77.3', '33.1', '67.8', '51.1', '73.3', '54.4'] mean IoU: 60.3 ```
zkyseu commented 1 year ago

deeplabv3_resnet50_coco

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 82c1a21。 运行方式: ``` # run training code cd cv/detection bash deeplabv3_resnet50_coco/train.sh ``` ``` # run inference code cd cv/detection bash deeplabv3_resnet50_coco/infer.sh ```
6、训练运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model deeplabv3_resnet50_coco \ --aux-loss ```
7、训练日志 训练超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='deeplabv3_mobilenet_v3_large_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/745f6b93-c9a4-4607-b39a-8c0ffcae6626)
8、测试运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model deeplabv3_resnet50_coco \ --aux-loss \ --lr 0.12 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='deeplabv3_resnet50_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 测试结果: ``` global correct: 92.4 average row correct: ['94.3', '88.3', '79.6', '77.0', '56.7', '62.0', '89.0', '72.6', '92.4', '52.2', '84.0', '70.2', '87.7', '85.9', '86.4', '91.9', '58.1', '90.8', '71.7', '90.1', '73.8'] IoU: ['91.2', '78.3', '67.7', '63.9', '47.9', '52.0', '82.7', '58.4', '85.5', '39.8', '72.7', '36.7', '71.0', '74.6', '78.1', '82.2', '32.7', '76.7', '55.9', '81.4', '64.2'] mean IoU: 66.4 ```
zkyseu commented 1 year ago

deeplabv3_resnet101_coco

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: d28a56f。 运行方式: ``` # run training code cd cv/detection bash deeplabv3_resnet101_coco/train.sh ``` ``` # run inference code cd cv/detection bash deeplabv3_resnet101_coco/infer.sh ```
6、训练运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model deeplabv3_resnet101_coco \ --aux-loss ```
7、训练日志 训练超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='deeplabv3_resnet101_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/f75c1f7f-6b2a-452a-bd9a-2ca4a16312d7)
8、测试运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model deeplabv3_resnet101_coco \ --aux-loss \ --lr 0.12 \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='deeplabv3_resnet101_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 测试结果: ``` global correct: 92.4 average row correct: ['94.3', '89.0', '80.2', '79.3', '61.2', '63.6', '85.3', '71.3', '94.1', '55.2', '83.6', '69.6', '87.0', '88.5', '86.6', '92.3', '62.3', '92.9', '73.4', '90.7', '77.0'] IoU: ['91.3', '79.2', '67.9', '64.6', '50.1', '50.2', '80.0', '59.8', '86.4', '41.6', '73.5', '35.8', '76.5', '77.7', '78.5', '82.9', '38.2', '78.9', '58.0', '79.1', '66.1'] mean IoU: 67.4 ```
zkyseu commented 1 year ago

lraspp_mobilenet_v3_large_coco

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 82c1a21。 运行方式: ``` # run training code cd cv/detection bash lraspp_mobilenet_v3_large_coco/train.sh ``` ``` # run inference code cd cv/detection bash lraspp_mobilenet_v3_large_coco/infer.sh ```
6、训练运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model lraspp_mobilenet_v3_large_coco \ ```
7、训练日志 训练超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='lraspp_mobilenet_v3_large_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/745f6b93-c9a4-4607-b39a-8c0ffcae6626)
8、测试运行的脚本 ``` python train.py \ -b 24 \ --dataset coco \ --data-path /dataset/coco \ --model lraspp_mobilenet_v3_large_coco \ --pretrained \ --test-only ```
9、测试结果 测试超参数: ``` amp=False, aux_loss=True, backend='pil', batch_size=24, data_path='/home/kunyangzhou/project/dataset/coco', dataset='coco', device='cuda', dist_url='env://', distributed=False, epochs=30, lr=0.12, lr_warmup_decay=0.01, lr_warmup_epochs=0, lr_warmup_method='linear', model='lraspp_mobilenet_v3_large_coco', momentum=0.9, output_dir='.', pretrained=True, print_freq=10, resume='', start_epoch=0, test_only=True, use_deterministic_algorithms=False, weight_decay=0.0001, weights=None, weights_backbone=None, workers=16, world_size=1 ``` 测试结果: ``` global correct: 91.2 average row correct: ['93.7', '84.9', '73.6', '74.6', '63.6', '50.6', '80.7', '65.1', '91.3', '42.2', '80.4', '70.6', '82.4', '81.8', '83.7', '88.5', '52.6', '87.9', '65.9', '88.3', '63.3'] IoU: ['90.1', '69.7', '58.2', '61.3', '49.7', '37.6', '72.7', '52.7', '79.1', '32.2', '64.6', '36.2', '66.7', '67.4', '70.1', '77.3', '33.1', '67.8', '51.1', '73.3', '54.4'] mean IoU: 60.3 ```
kokuro-asahi commented 1 year ago

shufflenet_v2_x1_0


cd CoModels/cv/classification/ShuffleNet_V2x1.0
bash train.sh
训练所用超参数
DATA:
  BATCH_SIZE: 32
  DATASET: imagenet
  DATA_PATH: /data/dataset/ImageNet/extract
  IMG_SIZE: 416
  NUM_WORKERS: 8

TRAIN:
  START_EPOCH: 0
  EPOCHS: 2
  WARMUP_EPOCHS: 0
  WEIGHT_DECAY: 1e-4
  BASE_LR: 1e-3
  WARMUP_LR: 5e-7

Inference ``` cd CoModels/cv/classification/ShuffleNet_V2x1.0 bash infer.sh ```
训练过程 * 训练日志 : ![image](https://github.com/Oneflow-Inc/CoModels/assets/59110141/c668b792-71ac-466d-ae3e-701f08d2cd39) * 训练结果 : ``` INFO * Acc@1 68.990 Acc@5 88.195 INFO Accuracy of the network on the 1563 test images: 69.0% INFO throughput averaged with 30 times INFO batch_size 32 throughput 1629.203907630756 ```
kokuro-asahi commented 1 year ago

shufflenet_v2_x0_5


cd CoModels/cv/classification/ShuffleNet_V2x0.5
bash train.sh
训练所用超参数
DATA:
  BATCH_SIZE: 32
  DATASET: imagenet
  DATA_PATH: /data/dataset/ImageNet/extract
  IMG_SIZE: 416
  NUM_WORKERS: 8

TRAIN:
  START_EPOCH: 0
  EPOCHS: 2
  WARMUP_EPOCHS: 0
  WEIGHT_DECAY: 1e-4
  BASE_LR: 1e-3
  WARMUP_LR: 5e-7

Inference ``` cd CoModels/cv/classification/ShuffleNet_V2x0.5 bash infer.sh ```
训练过程 * 训练日志 : ![image](https://github.com/Oneflow-Inc/CoModels/assets/59110141/868b04a5-6229-4a12-b3ee-68cb077e4b7e) * 训练结果 : ``` INFO * Acc@1 60.008 Acc@5 81.333 INFO Accuracy of the network on the 1563 test images: 60.0% INFO throughput averaged with 30 times INFO batch_size 32 throughput 3871.735964407042 ```
zkyseu commented 1 year ago

iresnet50

1、机器:A100@192.168.40.21,显存40GB。

2、数据集:MSCOCO 2017数据集,80个类别。数据集位置:/data/dataset/coco

3、oneflow version commit:dea3f43。

4、flowvision version:0.2.1。

5、CoModels commit commit: 82c1a21。 运行方式: ``` # run training code cd cv/detection bash lraspp_mobilenet_v3_large_coco/train.sh ``` ``` # run inference code cd cv/detection bash lraspp_mobilenet_v3_large_coco/infer.sh ```
6、训练运行的脚本 ``` python3 train.py configs/ms1mv3_r50.py ```
7、训练日志 训练超参数: ``` Training: 2023-11-01 10:14:36,896-rank_id: 0 Training: 2023-11-01 10:14:36,897-: loss cosface Training: 2023-11-01 10:14:36,897-: network r50 Training: 2023-11-01 10:14:36,897-: resume False Training: 2023-11-01 10:14:36,898-: output out101 Training: 2023-11-01 10:14:36,898-: dataset ms1m-retinaface-t1 Training: 2023-11-01 10:14:36,898-: embedding_size 512 Training: 2023-11-01 10:14:36,898-: fp16 True Training: 2023-11-01 10:14:36,899-: model_parallel True Training: 2023-11-01 10:14:36,899-: sample_rate 0.1 Training: 2023-11-01 10:14:36,899-: partial_fc 0 Training: 2023-11-01 10:14:36,899-: graph False Training: 2023-11-01 10:14:36,900-: synthetic False Training: 2023-11-01 10:14:36,900-: scale_grad False Training: 2023-11-01 10:14:36,900-: momentum 0.9 Training: 2023-11-01 10:14:36,901-: weight_decay 0.0005 Training: 2023-11-01 10:14:36,901-: batch_size 128 Training: 2023-11-01 10:14:36,901-: lr 0.1 Training: 2023-11-01 10:14:36,901-: val_image_num {'lfw': 12000, 'cfp_fp': 14000, 'agedb_30': 12000} Training: 2023-11-01 10:14:36,902-: ofrecord_path /home/kunyangzhou/project/dataset/wideface/ms1m-retinaface-t1/ofrecord/ Training: 2023-11-01 10:14:36,902-: num_classes 93432 Training: 2023-11-01 10:14:36,902-: num_image 5179510 Training: 2023-11-01 10:14:36,902-: num_epoch 25 Training: 2023-11-01 10:14:36,903-: warmup_epoch -1 Training: 2023-11-01 10:14:36,903-: decay_epoch [10, 16, 22] Training: 2023-11-01 10:14:36,903-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2023-11-01 10:14:36,903-: ofrecord_part_num 8 ``` 训练损失: ![image](https://github.com/Oneflow-Inc/CoModels/assets/118790294/7cbdebda-4427-418b-afed-607538f51a0b)
8、测试运行的脚本 ``` python val.py configs/ms1mv3_r50 --model_path output_ckpt/epoch_0 ```
9、测试结果 测试超参数: ``` Training: 2023-11-01 10:14:36,896-rank_id: 0 Training: 2023-11-01 10:14:36,897-: loss cosface Training: 2023-11-01 10:14:36,897-: network r50 Training: 2023-11-01 10:14:36,897-: resume False Training: 2023-11-01 10:14:36,898-: output out101 Training: 2023-11-01 10:14:36,898-: dataset ms1m-retinaface-t1 Training: 2023-11-01 10:14:36,898-: embedding_size 512 Training: 2023-11-01 10:14:36,898-: fp16 True Training: 2023-11-01 10:14:36,899-: model_parallel True Training: 2023-11-01 10:14:36,899-: sample_rate 0.1 Training: 2023-11-01 10:14:36,899-: partial_fc 0 Training: 2023-11-01 10:14:36,899-: graph False Training: 2023-11-01 10:14:36,900-: synthetic False Training: 2023-11-01 10:14:36,900-: scale_grad False Training: 2023-11-01 10:14:36,900-: momentum 0.9 Training: 2023-11-01 10:14:36,901-: weight_decay 0.0005 Training: 2023-11-01 10:14:36,901-: batch_size 128 Training: 2023-11-01 10:14:36,901-: lr 0.1 Training: 2023-11-01 10:14:36,901-: val_image_num {'lfw': 12000, 'cfp_fp': 14000, 'agedb_30': 12000} Training: 2023-11-01 10:14:36,902-: ofrecord_path /home/kunyangzhou/project/dataset/wideface/ms1m-retinaface-t1/ofrecord/ Training: 2023-11-01 10:14:36,902-: num_classes 93432 Training: 2023-11-01 10:14:36,902-: num_image 5179510 Training: 2023-11-01 10:14:36,902-: num_epoch 25 Training: 2023-11-01 10:14:36,903-: warmup_epoch -1 Training: 2023-11-01 10:14:36,903-: decay_epoch [10, 16, 22] Training: 2023-11-01 10:14:36,903-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2023-11-01 10:14:36,903-: ofrecord_part_num 8 ``` 测试结果: ``` Training: 2023-10-31 20:09:26,159-testing verification.. Training: 2023-10-31 20:09:38,419-(12000, 512) Training: 2023-10-31 20:09:38,419-infer time:11.948389 Training: 2023-10-31 20:09:40,824-[lfw][72000]XNorm: 22.705218 Training: 2023-10-31 20:09:40,825-[lfw][72000]Accuracy-Flip: 0.99700+-0.00379 Training: 2023-10-31 20:09:40,825-[lfw][72000]Accuracy-Highest: 0.99833 Training: 2023-10-31 20:09:40,825-testing verification.. Training: 2023-10-31 20:09:55,272-(14000, 512) Training: 2023-10-31 20:09:55,272-infer time:14.100678 Training: 2023-10-31 20:09:57,891-[cfp_fp][72000]XNorm: 20.599993 Training: 2023-10-31 20:09:57,891-[cfp_fp][72000]Accuracy-Flip: 0.98371+-0.00357 Training: 2023-10-31 20:09:57,891-[cfp_fp][72000]Accuracy-Highest: 0.98386 Training: 2023-10-31 20:09:57,891-testing verification.. Training: 2023-10-31 20:10:10,165-(12000, 512) Training: 2023-10-31 20:10:10,166-infer time:12.014776 Training: 2023-10-31 20:10:12,597-[agedb_30][72000]XNorm: 21.982928 Training: 2023-10-31 20:10:12,597-[agedb_30][72000]Accuracy-Flip: 0.97933+-0.00786 Training: 2023-10-31 20:10:12,597-[agedb_30][72000]Accuracy-Highest: 0.97983 ```
akeeei commented 1 year ago

SE-ResNet101


cd CoModels/cv/classification/se_resnet101
bash train.sh
训练所用超参数
DATA:
  BATCH_SIZE: 32
  DATASET: imagenet
  DATA_PATH: /data/dataset/ImageNet/extract
  IMG_SIZE: 256
  NUM_WORKERS: 8

MODEL:
  PRETRAINED: True
  RESUME: ""
  LABEL_SMOOTHING: 0.1

TRAIN:
  START_EPOCH: 0
  EPOCHS: 20
  WARMUP_EPOCHS: 0
  WEIGHT_DECAY: 1e-4
  BASE_LR: 1e-3
  WARMUP_LR: 5e-7

  LR_SCHEDULER:
    NAME: step
    DECAY_EPOCHS: 1
    DECAY_RATE: 0.8

Inference ``` cd CoModels/cv/classification/se_resnet101 bash infer.sh ```
训练过程 (2 epochs) * 训练日志 : ![image](https://github.com/Oneflow-Inc/CoModels/assets/98879022/87b277a2-f7da-426c-b029-0b8bdc7449e2) * 训练结果 : ``` INFO * Acc@1 78.675 Acc@5 94.378 INFO Accuracy of the network on the 391 test images: 78.7% INFO Max accuracy: 78.76% INFO Training time 18:12:00 ```
推理结果 ``` INFO * Acc@1 78.326 Acc@5 94.238 INFO Accuracy of the network on the 1563 test images: 78.3% INFO throughput averaged with 30 times INFO batch_size 32 throughput 247.34482452324224 ```