Closed yonatanbitton closed 3 years ago
Hi!
Yes, you should be able to load LXMERT in here and then fine-tune on RefCOCO.
To do so, you'd need to map the state dict of model_LXRT.pth
into the one of this repo.
You can have a look at our LXMERT checkpoint
to see the corresponding layer names.
Just note that I used Transformer sub-layers as "layers" in this repo (see here).
If you write a script that maps the checkpoint from the official LXMERT repository onto VOLTA, do send a PR! :)
Hey. I am on it. Your repository is simple to use - great resource, thanks.
First, I tried to provide the model as a pretrained checkpoint. The training actually converged:
python train_task.py --bert_model bert-base-uncased --config_file config/ctrl_lxmert.json --from_pretrained /data/users/yonatab/lxmert/snap/pretrained/model_LXRT.pth --tasks_config_file config_tasks/ctrl_trainval_tasks.yml --task 10 --adam_epsilon 1e-6 --adam_betas 0.9 0.999 --adam_correct_bias --weight_decay 0.0001 --warmup_proportion 0.1 --clip_grad_norm 1.0 --output_dir checkpoints/refcoco+_unc/ctrl_lxmert_original_lxmert --logdir logs/refcoco+_unc
In evaluation, with the CTRL LXMERT I reached 71.27, and with the way performed here I reach 66.86.
Now, I want to reduce the gap between the implementations.
I'm not sure how to perform the mapping.
I am looking on the LXMERT CTRL
state dict (contains 515 layers), your LXMERT
checkoint (which has 517 layers) and the original LXMERT state dict (contains 473 layers).
I understand that this changes are needed:
module
prefixattention.self
to attention_self
attention.output
to attention_output
This changes reduces the different
layers to 408 and 450 but it doesn't seem to be enough and I am not sure on how to proceed.
Do you have anything to elaborate on this subject?
Thanks!
Thanks!
Yes,
One major difference I can recall from the original LXMERT is that we LXMERT CTRL
using only MRC-KL as the visual loss, while our LXMERT
checkpoint will also have the weights for the XENT and regression losses (https://github.com/e-bug/volta/blob/main/config/lxmert.json#L19).
They however also had a VQA task, which we didn't include due to the pretraining data we used.
I'd recommend the (perhaps painful) process of trying to match them layer by layer and try to see where they diverge.
Maybe, simply printing them via Python's zip(original_dict, volta_dict)
might be a first step.
Keep me posted on how it goes!
Hey, I have a partial mapping.
First, without any mapping, just loading the model_LXRT.pth
of LXMERT works (nice 👍 ). It reaches 66.862 on RefCOCO. With the propsed code, it reaches 69.836. Your CTRL LXMERT reaches 71.268. Close enough for me.
In utils.py
file, after this line state_dict = torch.load(resolved_archive_file, map_location="cpu")
:
Adding this code is partially mapping the layers from original LXMERT model.
It's basically just removing the "module." prefix, and mapping attention.self
to attention_self
, and attention.output
to attention_output
.
state_dict = torch.load(resolved_archive_file, map_location="cpu")
### CHANGING STATE DICT - LXMERT
if '.pth' in resolved_archive_file:
print(f"*** CHANGING STATE DICT ***")
def remove_starting_module(x):
return ".".join(x.split('.')[1:])
import collections
new_state_dict = collections.OrderedDict([(remove_starting_module(k), v) for k, v in state_dict.items()])
new_state_dict = collections.OrderedDict([('attention_self', v) if 'attention.self' in k else (k, v) for k, v in new_state_dict.items()])
new_state_dict = collections.OrderedDict([('attention_output', v) if 'attention.output' in k else (k, v) for k, v in new_state_dict.items()])
state_dict = new_state_dict
Thanks for the help
Nice, thanks!
Did you figure out which layers didn't match?
The names of the layers do not match (intersection of layer names is 0) Some layers seem to be similar, for example the 3 changes I've made. It is possible that further matching can be made.
Hello. I have a pretrained two LXMERT models, using the official LXMERT GitHub repository. I want to evaluate my models on RefCOCO. I was wondering if it is possible to use your implementation to fine-tune in on the RefCOCO task?
I do not want to change to pre-training, just to compare these two models on RefCOCO. My models are stored in
model_LXRT.pth
files (same as LXMERT implementation).Thanks!