Closed WinterCodeForEverything closed 3 months ago
Did you change any training config such as the GPU number? Since the learning rate is multiplied by GPU number here, we find that the results are currently unproducible when setting GPU numer to 8 (we got the similar results as yours with very low grounding performance).
If you did not change any config, maybe you can try a lower max_grad_norm
here. We use max_grad_norm=0.01
in recent experiments and it seems to be more stable.
This code snippet has no bug. Actually the "obj_id" in train annotation is not used during training, and the "obj_id" in val annotation is representing the id of the gt object in the gt segmentations, which will be used to retrieve the gt bbox for calculating the IoU.
Did you change any training config such as the GPU number? Since the learning rate is multiplied by GPU number here, we find that the results are currently unproducible when setting GPU numer to 8 (we got the similar results as yours with very low grounding performance).
If you did not change any config, maybe you can try a lower
max_grad_norm
here. We usemax_grad_norm=0.01
in recent experiments and it seems to be more stable.This code snippet has no bug. Actually the "obj_id" in train annotation is not used during training, and the "obj_id" in val annotation is representing the id of the gt object in the gt segmentations, which will be used to retrieve the gt bbox for calculating the IoU.
That's an interesting phenomenon, it seems like doubling the GPU number from 4 to 8 would have a huge impact on the grounding tasks? just because the 2x learning rate?
The grounding and captioning tasks are all based on object ids. So the low performance indicates that the object id is not well-trained, and we can't directly observe this from the loss curve (the loss curve looks normal when it produces poor grounding results). We've did some ablation studies but still haven't found the true reason. Basically, I think a possible reason is about data scale. Current data (number and diversity) is not enough for feeding the trainable weights and the model may be trained towards a wrong direction that neglects the learning of object ids (the process could be vulnerable to learning rate). So we may plan to add more data in the future, especially those designed for learning object ids.
Nonetheless, It's not that hard to reproduce the reported results if you stick to our training config.
Thanks for your explain, I didn't change the GPU number but do the following change in run.sh because I can't install slurm
In the evaulation stage it hints: WARNING 2024-08-07T14:34:51 | py.warnings: /data/projects/15003900/Chat-3D-v2/utils/distributed.py:18: UserWarning: do_sample
is set to False
. However, top_p
is set to 0.6
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset top_p
.
builtin_warn(*args, **kwargs)
Is this the reason?
It looks no problem. Can I see your training log (the train.log
file under output directory)?
I'm re-training this code today to see if it can be reproduced in my environment now. Maybe we can compare the training log to find some difference.
Oh, I change the training epoch from 3 to 2, the learning rate seems to drop faster than before, there is the training log: train.log But I train the code in 3 epoch with the setting add_scene_token=True, and get the similiar result( very poor grounding peformance), there is the training log: train.log
seems I can't change any setting in the config, maybe I should try max_grad_norm=0.01 to make it more stable. If it's still so unpredictable, maybe there are some problems unsovled in the code or in the design?
I used the original config and have trained it for 2 epochs (out of 3). The results are reproducible in my environment. train.log
The add_scene_token
should be set to False
as default. Because we didn't use it in our experiments and we also didn't mention it in the readme instructions.
I recommond training it with default setting first. If it is still unreproducible in your environment, we can further help you find the problem.
Thanks for your patience, I reproduce the results with the default setting, it's a good work.
I run the training and evaluation code and find the result on ScanRefer and Multi3DRefer are exectly bad, I'm not sure is there any bug in the code
Such as, I'm curious why obj_id are different between training and evalutation in prepare_scanrefer_annos.py, is this a bug?