Closed fabrahman closed 2 years ago
@myleott any pointers please? I would really appreciate that.
I also tried to create a new environment and fresh install fairseq and redo it again. I cannot finetune on the pretrained model using fusion command here. When I try to interrupt the program (because it takes for ever and doesn't do anything) seems like the problem should be in loading the checkpoint. Note that I am exactly using the chekcpoint in previous step. this is the log after trying to manually interrupt the program:
.attproj.1.weight", "decoder.pretrained_decoder.attproj.1.bias", "decoder.pretrained_decoder.attproj.2.weight", "decoder.pretrained_decoder.attproj.2.b[35/1920]
coder.pretrained_decoder.attproj.3.weight", "decoder.pretrained_decoder.attproj.3.bias", "decoder.pretrained_decoder.attproj.4.weight", "decoder.pretrained_deco
der.attproj.4.bias", "decoder.pretrained_decoder.attproj.5.weight", "decoder.pretrained_decoder.attproj.5.bias", "decoder.pretrained_decoder.attproj.6.weight",
"decoder.pretrained_decoder.attproj.6.bias", "decoder.pretrained_decoder.fc2.weight", "decoder.pretrained_decoder.fc2.bias", "decoder.pretrained_decoder.fc3.wei
ght", "decoder.pretrained_decoder.fc3.bias", "decoder.gate1.0.weight", "decoder.gate1.0.bias", "decoder.gate2.0.weight", "decoder.gate2.0.bias", "decoder.joinin
g.0.weight", "decoder.joining.0.bias", "decoder.joining.1.weight", "decoder.joining.1.bias", "decoder.joining.3.weight", "decoder.joining.3.bias", "decoder.join
ing.4.weight", "decoder.joining.4.bias", "decoder.joining.6.weight", "decoder.joining.6.bias", "decoder.joining.7.weight", "decoder.joining.7.bias", "pretrained
_encoder.encoder.embed_tokens.weight", "pretrained_encoder.encoder.embed_positions.weight", "pretrained_encoder.encoder.fc1.weight", "pretrained_encoder.encoder
.fc1.bias", "pretrained_encoder.encoder.projections.2.weight", "pretrained_encoder.encoder.projections.2.bias", "pretrained_encoder.encoder.convolutions.0.weigh
t", "pretrained_encoder.encoder.convolutions.0.bias", "pretrained_encoder.encoder.convolutions.1.weight", "pretrained_encoder.encoder.convolutions.1.bias", "pre
trained_encoder.encoder.convolutions.2.weight", "pretrained_encoder.encoder.convolutions.2.bias", "pretrained_encoder.encoder.fc2.weight", "pretrained_encoder.e
ncoder.fc2.bias".
During handling of the above exception, another exception occurred:
ready = selector.select(timeout)
Traceback (most recent call last):
File "/home/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/anaconda3/envs/fairseq/lib/python3.6/selectors.py", line 376, in select
File "/home/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/fairseq/fairseq_cli/train.py", line 286, in distributed_main
main(args, init_distributed=True)
File "/home/fairseq/fairseq_cli/train.py", line 286, in distributed_main
main(args, init_distributed=True)
File "/home/fairseq/fairseq_cli/train.py", line 81, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/home/fairseq/fairseq_cli/train.py", line 81, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/home/fairseq/fairseq/checkpoint_utils.py", line 134, in load_checkpoint
reset_meters=args.reset_meters,
File "/home/fairseq/fairseq/trainer.py", line 187, in load_checkpoint
"please ensure that the architectures match.".format(filename)
File "/home/fairseq/fairseq/checkpoint_utils.py", line 134, in load_checkpoint
reset_meters=args.reset_meters,
File "/home/fairseq/fairseq/trainer.py", line 187, in load_checkpoint
"please ensure that the architectures match.".format(filename)
Exception: Cannot load model parameters from checkpoint checkpoints/checkpoint_last.pt; please ensure that the architectures match.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
Exception: Cannot load model parameters from checkpoint checkpoints/checkpoint_last.pt; please ensure that the architectures match.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 25, in _wrap
error_queue.put(traceback.format_exc())
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/queues.py", line 347, in put
self._writer.send_bytes(obj)
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/connection.py", line 398, in _send_bytes
self._send(buf)
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 25, in _wrap
error_queue.put(traceback.format_exc())
KeyboardInterrupt
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/queues.py", line 347, in put
self._writer.send_bytes(obj)
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/connection.py", line 398, in _send_bytes
self._send(buf)
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
fd_event_list = self._poll.poll(timeout)
KeyboardInterrupt
KeyboardInterrupt
I would appreciate any thought on this.
@myleott , @huihuifan Can you please kindly help me with this? I'm stuck in this. Thanks
based on your stack trace, it looks like it's having trouble loading the model architecture as the state dicts don't match. Can you put a breakpoint with pdb to check what the misalignment could be?
based on your stack trace, it looks like it's having trouble loading the model architecture as the state dicts don't match. Can you put a breakpoint with pdb to check what the misalignment could be?
Thanks for reply @huihuifan .
I just realized from the beginning of previous log that it probably should be for loading state_dict of FConvModelSelfAtt
. Maybe it doesn't initialize the added component to fusion model? The missing keys are related to pretrained_encoder and pretrained_decoder.
here is the log:
^CProcess SpawnProcess-1:
Process SpawnProcess-2:
Traceback (most recent call last):
File "/home/anaconda3/envs/fairseq/bin/fairseq-train", line 11, in <module>
load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
File "/home/fairseq/fairseq_cli/train.py", line 317, in cli_main
nprocs=args.distributed_world_size,
File "/home/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 77, in join
timeout=timeout,
File "/home/anaconda3/envs/fairseq/lib/python3.6/multiprocessing/connection.py", line 911, in wait
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/fairseq/fairseq/trainer.py", line 178, in load_checkpoint
state["model"], strict=True, args=self.args
File "/home/fairseq/fairseq/trainer.py", line 178, in load_checkpoint
state["model"], strict=True, args=self.args
File "/home/fairseq/fairseq/models/fairseq_model.py", line 93, in load_state_dict
return super().load_state_dict(new_state_dict, strict)
File "/home/fairseq/fairseq/models/fairseq_model.py", line 93, in load_state_dict
return super().load_state_dict(new_state_dict, strict)
File "/home/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
File "/home/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for FConvModelSelfAtt:
Missing key(s) in state_dict: "encoder.pretrained.encoder.embed_tokens.weight", "encoder.pretrained.encoder.embed_positions.weight", "encoder.pretrained
.encoder.fc1.weight", "encoder.pretrained.encoder.fc1.bias", "encoder.pretrained.encoder.projections.2.weight", "encoder.pretrained.encoder.projections.2.bias",
"encoder.pretrained.encoder.convolutions.0.weight", "encoder.pretrained.encoder.convolutions.0.bias", "encoder.pretrained.encoder.convolutions.1.weight", "enco
der.pretrained.encoder.convolutions.1.bias", "encoder.pretrained.encoder.convolutions.2.weight", "encoder.pretrained.encoder.convolutions.2.bias", "encoder.pret
rained.encoder.fc2.weight", "encoder.pretrained.encoder.fc2.bias", "decoder.pretrained_decoder.version", "decoder.pretrained_decoder.embed_tokens.weight", "deco
der.pretrained_decoder.embed_positions.weight", "decoder.pretrained_decoder.fc1.weight", "decoder.pretrained_decoder.fc1.bias", "decoder.pretrained_decoder.proj
ections.4.weight", "decoder.pretrained_decoder.projections.4.bias", "decoder.pretrained_decoder.projections.6.weight", "decoder.pretrained_decoder.projections.6
.bias", "decoder.pretrained_decoder.convolutions.0.weight", "decoder.pretrained_decoder.convolutions.0.bias", "decoder.pretrained_decoder.convolutions.1.weight"
, "decoder.pretrained_decoder.convolutions.1.bias", "decoder.pretrained_decoder.convolutions.2.weight", "decoder.pretrained_decoder.convolutions.2.bias", "decod
er.pretrained_decoder.convolutions.3.weight", "decoder.pretrained_decoder.convolutions.3.bias", "decoder.pretrained_decoder.convolutions.4.weight", "decoder.pre
trained_decoder.convolutions.4.bias", "decoder.pretrained_decoder.convolutions.5.weight", "decoder.pretrained_decoder.convolutions.5.bias", "decoder.pretrained_
decoder.convolutions.6.weight", "decoder.pretrained_decoder.convolutions.6.bias", "decoder.pretrained_decoder.attention.0.attention_module.in_proj_q.bias", "dec
oder.pretrained_decoder.attention.0.attention_module.in_proj_q.weight_g", "decoder.pretrained_decoder.attention.0.attention_module.in_proj_q.weight_v", "decoder
.pretrained_decoder.attention.0.attention_module.in_proj_k.0.bias", "decoder.pretrained_decoder.attention.0.attention_module.in_proj_k.0.weight_g", "decoder.pre
trained_decoder.attention.0.attention_module.in_proj_k.0.weight_v", "decoder.pretrained_decoder.attention.0.attention_module.in_proj_v.0.bias", "decoder.pretrai
ned_decoder.attention.0.attention_module.in_proj_v.0.weight_g", "decoder.pretrained_decoder.attention.0.attention_module.in_proj_v.0.weight_v", "decoder.pretrai
ned_decoder.attention.0.attention_module.out_proj.bias", "decoder.pretrained_decoder.attention.0.attention_module.out_proj.weight_g", "decoder.pretrained_decode
r.attention.0.attention_module.out_proj.weight_v", "decoder.pretrained_decoder.attention.1.attention_module.in_proj_q.bias", "decoder.pretrained_decoder.attenti
on.1.attention_module.in_proj_q.weight_g", "decoder.pretrained_decoder.attention.1.attention_module.in_proj_q.weight_v", "decoder.pretrained_decoder.attention.1
.attention_module.in_proj_k.0.bias", "decoder.pretrained_decoder.attention.1.attention_module.in_proj_k.0.weight_g", "decoder.pretrained_decoder.attention.1.att
ention_module.in_proj_k.0.weight_v", "decoder.pretrained_decoder.attention.1.attention_module.in_proj_v.0.bias", "decoder.pretrained_decoder.attention.1.attenti
on_module.in_proj_v.0.weight_g", "decoder.pretrained_decoder.attention.1.attention_module.in_proj_v.0.weight_v", "decoder.pretrained_decoder.attention.1.attenti
on_module.out_proj.bias", "decoder.pretrained_decoder.attention.1.attention_module.out_proj.weight_g", "decoder.pretrained_decoder.attention.1.attention_module.
out_proj.weight_v", "decoder.pretrained_decoder.attention.2.attention_module.in_proj_q.bias", "decoder.pretrained_decoder.attention.2.attention_module.in_proj_q
.weight_g", "decoder.pretrained_decoder.attention.2.attention_module.in_proj_q.weight_v", "decoder.pretrained_decoder.attention.2.attention_module.in_proj_k.0.b
ias", "decoder.pretrained_decoder.attention.2.attention_module.in_proj_k.0.weight_g", "decoder.pretrained_decoder.attention.2.attention_module.in_proj_k.0.weigh
t_v", "decoder.pretrained_decoder.attention.2.attention_module.in_proj_v.0.bias", "decoder.pretrained_decoder.attention.2.attention_module.in_proj_v.0.weight_g"
, "decoder.pretrained_decoder.attention.2.attention_module.in_proj_v.0.weight_v", "decoder.pretrained_decoder.attention.2.attention_module.out_proj.bias", "deco
der.pretrained_decoder.attention.2.attention_module.out_proj.weight_g", "decoder.pretrained_decoder.attention.2.attention_module.out_proj.weight_v", "decoder.pr
etrained_decoder.attention.3.attention_module.in_proj_q.bias", "decoder.pretrained_decoder.attention.3.attention_module.in_proj_q.weight_g", "decoder.pretrained
_decoder.attention.3.attention_module.in_proj_q.weight_v", "decoder.pretrained_decoder.attention.3.attention_module.in_proj_k.0.bias", "decoder.pretrained_decod
er.attention.3.attention_module.in_proj_k.0.weight_g", "decoder.pretrained_decoder.attention.3.attention_module.in_proj_k.0.weight_v", "decoder.pretrained_decod
er.attention.3.attention_module.in_proj_v.0.bias", "decoder.pretrained_decoder.attention.3.attention_module.in_proj_v.0.weight_g", "decoder.pretrained_decoder.a
ttention.3.attention_module.in_proj_v.0.weight_v", "decoder.pretrained_decoder.attention.3.attention_module.out_proj.bias", "decoder.pretrained_decoder.attentio
n.3.attention_module.out_proj.weight_g", "decoder.pretrained_decoder.attention.3.attention_module.out_proj.weight_v", "decoder.pretrained_decoder.attention.4.at
tention_module.in_proj_q.bias", "decoder.pretrained_decoder.attention.4.attention_module.in_proj_q.weight_g", "decoder.pretrained_decoder.attention.4.attention_
module.in_proj_q.weight_v", "decoder.pretrained_decoder.attention.4.attention_module.in_proj_k.0.bias", "decoder.pretrained_decoder.attention.4.attention_module
.in_proj_k.0.weight_g", "decoder.pretrained_decoder.attention.4.attention_module.in_proj_k.0.weight_v", "decoder.pretrained_decoder.attention.4.attention_module
.in_proj_v.0.bias", "decoder.pretrained_decoder.attention.4.attention_module.in_proj_v.0.weight_g", "decoder.pretrained_decoder.attention.4.attention_module.in_
proj_v.0.weight_v", "decoder.pretrained_decoder.attention.4.attention_module.out_proj.bias", "decoder.pretrained_decoder.attention.4.attention_module.out_proj.w
eight_g", "decoder.pretrained_decoder.attention.4.attention_module.out_proj.weight_v", "decoder.pretrained_decoder.attention.5.attention_module.in_proj_q.bias",
"decoder.pretrained_decoder.attention.5.attention_module.in_proj_q.weight_g", "decoder.pretrained_decoder.attention.5.attention_module.in_proj_q.weight_v", "de
coder.pretrained_decoder.attention.5.attention_module.in_proj_k.0.bias", "decoder.pretrained_decoder.attention.5.attention_module.in_proj_k.0.weight_g", "decode
r.pretrained_decoder.attention.5.attention_module.in_proj_k.0.weight_v", "decoder.pretrained_decoder.attention.5.attention_module.in_proj_v.0.bias", "decoder.pr
etrained_decoder.attention.5.attention_module.in_proj_v.0.weight_g", "decoder.pretrained_decoder.attention.5.attention_module.in_proj_v.0.weight_v", "decoder.pr
etrained_decoder.attention.5.attention_module.out_proj.bias", "decoder.pretrained_decoder.attention.5.attention_module.out_proj.weight_g", "decoder.pretrained_d
ecoder.attention.5.attention_module.out_proj.weight_v", "decoder.pretrained_decoder.attention.6.attention_module.in_proj_q.bias", "decoder.pretrained_decoder.at
tention.6.attention_module.in_proj_q.weight_g", "decoder.pretrained_decoder.attention.6.attention_module.in_proj_q.weight_v", "decoder.pretrained_decoder.attent
ion.6.attention_module.in_proj_k.0.bias", "decoder.pretrained_decoder.attention.6.attention_module.in_proj_k.0.weight_g", "decoder.pretrained_decoder.attention.
6.attention_module.in_proj_k.0.weight_v", "decoder.pretrained_decoder.attention.6.attention_module.in_proj_v.0.bias", "decoder.pretrained_decoder.attention.6.at
@huihuifan I realized someone else has this problem #307, however none of --restore-file
or --save-dir
and point it to the same pretrained checkpoint (as mentioned to solve issue for the owner) works for me, and I still get the same error of missing keys
for pretrained part.
When I print the state_dict keys of the checkpoint_best.pt
it doesn't contain the "pretrained" keys, which make sense. However, when the model is being built here it has pretrained encoder/decoder parts which needs to be loaded.
What I don't get is : should the keys in the pretrained model get renamed somewhere?
If you have any suggestions, please let me know.
Thanks
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
Hi,
I am following the read me in examples/storie I have been able to train the model using following command:
I am also able to use
interactive.py
to generate stories using the bestcheckpoint_best.pt
obtained in the previous step. However when trying to finetune it in fusion setup and using the following command, nothing happens and it is stuck after printing out model architecture. command according to readme:log which is stuck and doesn't start training:
Any idea why this happens? Thanks