Open Lukas-Ma1 opened 10 months ago
The DRY_RUN
mode is intended for fast validation of the code's correctness. It does this by trimming down the number of samples used for training or validation. So, getting a 0.8614
result indicates that you've set everything up correctly and you're all set for a full run.
Thank you, there are another 2 questions I would like to ask:
TRAIN_WITH_VAL_DATASET=True torchrun --nproc_per_node=4 -m oadp.dp.train oadp_ov_coco configs/dp/oadp_ov_coco.py --override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json
, without DRY_RUN
. However, the result still not seem to be normal, it only gets 0.1649 on mAPN50, doesn't reach 0.313. Should I reduce all optional parts and run 'torchrun --nproc_per_node=4 -m oadp.dp.train vild_ov_coco configs/dp/vild_ov_coco.py'?TRAIN_WITH_VAL_DATASET=True
is set in the environment, it activates a debug mode, wherein the training is performed using the validation split of the MS-COCO 2017 dataset.
I would like to ask about coco training and test performance on DRY_RUN mode. I use DRY_RUN on each command you mentioned, including extract globals, objects and blocks features. When I run:
DRY_RUN=True TRAIN_WITH_VAL_DATASET=True torchrun --nproc_per_node=4 -m oadp.dp.train oadp_ov_coco configs/dp/oadp_ov_coco.py --override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json
The DP training for coco, the result be like:
I got a ridiculous result: 0.8614 mAP, there must be something wrong, but I check my process, data structure and commands, all these are following your steps. So is DRY_RUN makes this unreal result and I should turn to run without DRY_RUN?(By the way, extract globals, objects and blocks features on DRY_RUN seems to be smaller than without DRY_RUN's, so I should download from Baidu disk?)
Thanks for your attention and impressive work!