Open Poulinakis-Konstantinos opened 3 years ago
I have also observed this behavior in another metric I tried to log .
Bear in mind that for the first iteration the output is always the same for both the print and the logger. After the 1st step differences start building.
My first guess would be that the smoothing_hint is actually not disabled even though I set it as False storage.put_scalars(total_loss=losses_reduced, **loss_dict_reduced, smoothing_hint=False)
.
Thanks for reporting. I found that CommonMetricPrinter
always smooth the losses regardless of smoothing_hint
.
This behavior was OK since losses should almost always be smoothed. But I think letting it respect smoothing_hint
would be less confusing.
I am observing this unexpected behavior which might be caused due to some bug . I am trying to train a RetinaNet with a custom train loop based on plain_train_loop.py .
Inside do_train function I use CommonMetricPrinter and JSONWriter to log my training metrics. To provide metrics for the loggers I use
storage.put_scalars(total_loss=losses_reduced, **loss_dict_reduced, smoothing_hint=False)
. Afterwards I alsoprint(losses_reduced)
as a check.The values print command outputs are most of the time different than the values logged by the writers.
I am wondering if I am missing something or if this is indeed a bug. I also disabled smoothing_hint in case it was the root of the problem.
Instructions To Reproduce the š Bug:
import detectron2.utils.comm as comm from detectron2.checkpoint import Checkpointer, DetectionCheckpointer, PeriodicCheckpointer from detectron2.config import get_cfg from detectron2.data import (MetadataCatalog, build_detection_test_loader, build_detection_train_loader, DatasetMapper ) from detectron2.engine import default_argument_parser, default_setup, launch from detectron2.evaluation import ( COCOEvaluator, DatasetEvaluators, inference_on_dataset, print_csv_format, ) from detectron2.modeling import build_model from detectron2.solver import build_lr_scheduler, build_optimizer from detectron2.utils.events import ( CommonMetricPrinter, EventStorage, JSONWriter, TensorboardXWriter, ) from detectron2.data import MetadataCatalog from detectron2.data.catalog import DatasetCatalog from detectron2.data.datasets import register_coco_instances
logger = logging.getLogger("detectron2") from detectron2 import model_zoo from tensorboard import program
def get_COCO_evaluator( dataset_name, output_folder=None): if output_folder is None: output_folder = os.path.join(cfg.OUTPUT_DIR, "inference") return COCOEvaluator(dataset_name, tasks=('bbox',), distributed=True, output_dir = output_folder)
from custom_callbacks import callback_best_weights_mAP
adding a new parameter current_iteration
def do_test(cfg, model, current_iteration): results = OrderedDict()
Create a JSON writer for logging evaluation results
def do_train(cfg, model, resume=False): model.train( ) optimizer = build_optimizer(cfg, model) scheduler = build_lr_scheduler(cfg, optimizer)
# THIS IS WHERE THE BUG MIGHT BE storage.put_scalars(total_loss=losses_reduced, **loss_dict_reduced, smoothing_hint=False) print("TRAIN LOSSES :", losses_reduced)
def setup(args): """ Create configs and perform basic setups. """ OUTPUT_DIR = 'Output_V_1.0'
cfg.merge_from_file(args.config_file)
cfg.merge_from_list(args.opts)
def main(args): global MODEL_NAME MODEL_NAME = 'RetinaNet_V_1.0'
if name == "main": args = default_argument_parser().parse_args() print("Command Line Args:", args) launch( main, args.num_gpus, num_machines=args.num_machines, machine_rank=args.machine_rank, dist_url=args.dist_url, args=(args,), )>
PyTorch built with: