loss is zero - Githubissues

P16101150 commented 2 years ago

when i start training,why the rcnn_loss is zero?

zkp0113 commented 2 years ago

when i start training,why the rcnn_loss is zero?

hey bro. have u fix the bugs that "UnboundLocalError: local variable 'val_loss_epoch' referenced before assignment"? can u help me ? thanks a lot

P16101150 commented 2 years ago

you can delete line 25 and chang line 81 to val_set, val_loader = create_dataloader(logger, split=cfg.TRAIN.VAL_SPLIT)， it can get val_loader!

P16101150 commented 2 years ago

and do you know when i start training,why the rcnn_loss ,link_pos and link_neg always zero?

zkp0113 commented 2 years ago

@P16101150 Thanks a lot， I rewrite the code of line 81，because of the val_set and val_loader can not get value from " val_set, val_loader = create_dataloader(logger, split=cfg.TRAIN.VAL_SPLIT) if args.train_with_eval else None, None " , and then I get the val_set and val_loader, then the code can run, but laterly I found that the rcnn_loss ,link_pos and link_neg always zero. these days I think maybe the dataset is not correct for the code, but i have not fix the problem.

Hey bro, thanks again, and wanna to talk with u for this problem.

I do not fix the problem that, maybe we can talk about the problem~

zkp0113 commented 2 years ago

@P16101150

I think maybe these parameters are zero causing the best_model.pth not to be updated

P16101150 commented 2 years ago

@zkp0113 i think so，but i have no idea why the loss always zero，The author did not to solve the problem

zkp0113 commented 2 years ago

@P16101150

I'm guessing that the code is using the first part of the dataset all the time during training so that the various losses are always zero, I'm not sure if that's the right idea.

May I have your email? I think we can fix the problem together.

zkp0113 commented 2 years ago

@P16101150 Hey bro. Do you check the code ”val_set, val_loader = create_dataloader(logger, split=cfg.TRAIN.VAL_SPLIT) if args.train_with_eval else None, None“ train.py？I found that the val_set maybe not called by other code.

zkp0113 commented 2 years ago

@P16101150 Hey Bro，I fix the bugs that rcnn_loss always zero。But link_pos and link_neg still zero. And the best_model.pth is update right now. Look for your reply.

P16101150 commented 2 years ago

Can you tell me how to do it?

zkp0113 commented 2 years ago

Can you tell me how to do it?

check the command that when u run the code.--fintune needed when you train ur own dataset

P16101150 commented 2 years ago

But it say if you want to jointly train the detection and correlation models, remove the --finetune option ，So I didn't add this command

zkp0113 commented 2 years ago

But it say if you want to jointly train the detection and correlation models, remove the --finetune option ，So I didn't add this command

yep，and first i remove the --finetune option, it comes problem, so i checked the code when the--finetune option i removed ，i found the --finetune might be have been set in the config.py，and then i add it and trained. The problem fixed.

zkp0113 commented 2 years ago

@P16101150 We can chat with eachother. look for ur reply.

zkp0113 commented 2 years ago

@P16101150 U can check the loss-calculate code for this problem，if nothing send to the function the loss would be zero.

P16101150 commented 2 years ago

i add the --finetune as "python train.py --data_root data/KITTI --batch_size 8 --finetun --output_dir result",but i get the error raceback (most recent call last): File "A:\JMODT-main\train.py", line 151, in main() File "A:\JMODT-main\train.py", line 139, in main trainer.train( File "A:\JMODT-main\jmodt\utils\train_utils.py", line 137, in train train_loss, tb_dict, disp_dict = self.model_fn(self.model, batch) File "A:\JMODT-main\jmodt\detection\modeling\train_functions.py", line 21, in model_fn_train rpn_cls_label, rpn_reg_label = data['rpn_cls_label'], data['rpn_reg_label'] KeyError: 'rpn_cls_label' how can i fixed?

zkp0113 commented 2 years ago

@P16101150 I think this problem may cause by the parameters of the config.py file.

If you want to receive the rcnn_loss, you need to set the correct parameters to the train.py.

Check the fine-tune that the loss calculates function.

zkp0113 commented 2 years ago

@P16101150 if u set the " cfg.TRAIN.FINETUNE = True" the rcnn_loss will be always zero.

zzm-hl commented 1 year ago

Hi, bro, have you fixed the "zero" problem?

zkp0113 commented 1 year ago

The code provided by Huang has some problems. The result maybe correctly, but it is hard to reproduce. ----- Original Message ----- From: zzm-hl @.> To: Kemo-Huang/JMODT @.> Cc: zkp0113 @.>, Mention @.> Subject: Re: [Kemo-Huang/JMODT] loss is zero (Issue #10) Date: 2023-02-17 16:01

Hi, bro, have you fixed the "zero" problem?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Kemo-Huang / JMODT

loss is zero #10