Closed shirley-wu closed 3 years ago
Adding to this issue:
What is the parameter save_dir_key=lambda
? It is not referenced anywhere in code. I get what it's supposed to do, but running the code example as provided produces an error.
After removing it, I get this error:
Traceback (most recent call last):
File "xxx/fairseq/bin/fairseq-train", line 33, in <module>
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
File "xxx/fairseq/fairseq_cli/train.py", line 347, in cli_main
cli_main_helper(args)
File "xxx/fairseq/fairseq_cli/train.py", line 385, in cli_main_helper
main(args)
File "xxx/fairseq/fairseq_cli/train.py", line 72, in main
criterion = task.build_criterion(args)
File "xxx/fairseq/fairseq/tasks/fairseq_task.py", line 229, in build_criterion
return criterions.build_criterion(args, self)
File "xxx/fairseq/fairseq/registry.py", line 41, in build_x
return builder(args, *extra_args, **extra_kwargs)
File "xxx/fairseq/fairseq/criterions/fairseq_criterion.py", line 54, in build_criterion
raise NotImplementedError(
NotImplementedError: Unable to infer Criterion arguments, please implement LatencyAugmentedLabelSmoothedCrossEntropyCriterion.build_criterion
Did the API for criterions change since releasing this code?
I am running the master version of fairseq.
Hi, anyone solved the above issue NotImplementedError: Unable to infer Criterion arguments, please implement LatencyAugmentedLabelSmoothedCrossEntropyCriterion.build_criterion
Thank you!
This issue has come up because of the wrong initialization of label_smoothed_cross_entropy_latency_augmented.py
which is not aligned with fairseq_criterion.py
. The error is fixed in this commit. By changing the initialization of label_smoothed_cross_entropy_latency_augmented.py
to this will fix this issue.
class LatencyAugmentedLabelSmoothedCrossEntropyCriterion(
LabelSmoothedCrossEntropyCriterion
):
def __init__(self,
task,
sentence_avg,
label_smoothing,
ignore_prefix_size,
report_accuracy,
latency_weight_avg,
latency_weight_avg_type,
latency_weight_var,
latency_weight_var_type,
mass_preservation,
average_method
):
super().__init__(task, sentence_avg, label_smoothing, ignore_prefix_size, report_accuracy)
self.eps = label_smoothing
self.latency_weight_avg = latency_weight_avg
self.latency_weight_avg_type = latency_weight_avg_type
self.latency_weight_var = latency_weight_var
self.latency_weight_var_type = latency_weight_var_type
self.mass_preservation = mass_preservation
self.average_method = average_method
self.latency_train = LatencyTraining(
self.latency_weight_avg,
self.latency_weight_var,
self.latency_weight_avg_type,
self.latency_weight_var_type,
self.mass_preservation,
self.average_method,
)
I am still having the issue NotImplementedError: Unable to infer Criterion arguments, please implement LatencyAugmentedLabelSmoothedCrossEntropyCriterion.build_criterion
. Also, I am getting fairseq-train: error: unrecognized arguments: --label-smoothing 0.1
.
📚 Documentation
I'm trying to reproduce your result for your paper Monotonic Multihead Attention.
In the appendix of your paper, the hyperparameters for WMT'15 de-en are
encoder embed dim = decoder embed dim = 1024, encoder ffn embed dim = decoder ffn embed dim = 4096, encoder attention heads = decoder attention heads = 16
. This set of hyperparameters seem to be hthe same astransformer_monotonic_vaswani_wmt_en_de_big
architecture.However, in your README you use
transformer_monotonic_iwslt_de_en
architecture, whereencoder embed dim = decoder embed dim = 512, encoder ffn embed dim = decoder ffn embed dim = 1024, encoder attention heads = decoder attention heads = 4
. Could you tell me which one is correct?Besides, I'm confused by the batch size 3584 × 8 × 8 × 2. I understand that you conduct training using 8 GPU and
max_tokens = 3584
, but where does the other 8 × 2 comes from?