cvg / glue-factory

Training library for local feature detection and matching
Apache License 2.0
722 stars 90 forks source link

KeyError: 'loss/total' #47

Closed flashcolab closed 8 months ago

flashcolab commented 9 months ago

[12/12/2023 10:27:28 gluefactory INFO] Starting epoch 0 /home/user0/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( [12/12/2023 10:27:30 gluefactory INFO] [E 0 | it 0] loss {total 6.668E+00, last 5.244E+00, assignment_nll 5.244E+00, nll_pos 8.672E+00, nll_neg 1.815E+00, num_matchable 2.530E+02, num_unmatchable 2.575E+02, confidence 5.922E-01, row_norm 1.768E-01} [12/12/2023 10:27:30 gluefactory INFO] [Validation] {} Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/user0/workspace/Training/glue-factory/gluefactory/train.py", line 691, in main_worker(0, conf, output_dir, args) File "/home/user0/workspace/Training/glue-factory/gluefactory/train.py", line 628, in main_worker training(rank, conf, output_dir, args) File "/home/user0/workspace/Training/glue-factory/gluefactory/train.py", line 547, in training if results[conf.train.best_key] < best_eval: KeyError: 'loss/total'

train.py, line 69 "best_key": "loss/total", # key to use to select the best checkpoint

I am running into KeyError: 'loss/total' when trying to run the command for training superpoint-open+lightglue. Does anyone have a fix for this KeyError: 'loss/total' ?

youhha commented 9 months ago

I meet the same error

youhha commented 9 months ago

if results[conf.train.best_key] < best_eval: KeyError: 'loss/total'

youhha commented 9 months ago

Please check that your "val_loader" is empty

yuegao998 commented 3 months ago

I also encountered this issue. Did you solve it? Can you help me with it? I would be very grateful.