facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

Inconsistency of hyperparameters for Monotonic Multihead Attention between your README and the paper #2122

Closed shirley-wu closed 3 years ago

shirley-wu commented 4 years ago

📚 Documentation

I'm trying to reproduce your result for your paper Monotonic Multihead Attention.

In the appendix of your paper, the hyperparameters for WMT'15 de-en are encoder embed dim = decoder embed dim = 1024, encoder ffn embed dim = decoder ffn embed dim = 4096, encoder attention heads = decoder attention heads = 16. This set of hyperparameters seem to be hthe same as transformer_monotonic_vaswani_wmt_en_de_big architecture.

However, in your README you use transformer_monotonic_iwslt_de_en architecture, where encoder embed dim = decoder embed dim = 512, encoder ffn embed dim = decoder ffn embed dim = 1024, encoder attention heads = decoder attention heads = 4. Could you tell me which one is correct?

Besides, I'm confused by the batch size 3584 × 8 × 8 × 2. I understand that you conduct training using 8 GPU and max_tokens = 3584, but where does the other 8 × 2 comes from?

felix-schneider commented 4 years ago

Adding to this issue:

What is the parameter save_dir_key=lambda? It is not referenced anywhere in code. I get what it's supposed to do, but running the code example as provided produces an error.

After removing it, I get this error:

Traceback (most recent call last):
  File "xxx/fairseq/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "xxx/fairseq/fairseq_cli/train.py", line 347, in cli_main
    cli_main_helper(args)
  File "xxx/fairseq/fairseq_cli/train.py", line 385, in cli_main_helper
    main(args)
  File "xxx/fairseq/fairseq_cli/train.py", line 72, in main
    criterion = task.build_criterion(args)
  File "xxx/fairseq/fairseq/tasks/fairseq_task.py", line 229, in build_criterion
    return criterions.build_criterion(args, self)
  File "xxx/fairseq/fairseq/registry.py", line 41, in build_x
    return builder(args, *extra_args, **extra_kwargs)
  File "xxx/fairseq/fairseq/criterions/fairseq_criterion.py", line 54, in build_criterion
    raise NotImplementedError(
NotImplementedError: Unable to infer Criterion arguments, please implement LatencyAugmentedLabelSmoothedCrossEntropyCriterion.build_criterion

Did the API for criterions change since releasing this code?

I am running the master version of fairseq.

sathishreddy commented 3 years ago

Hi, anyone solved the above issue NotImplementedError: Unable to infer Criterion arguments, please implement LatencyAugmentedLabelSmoothedCrossEntropyCriterion.build_criterion

Thank you!

sathishreddy commented 3 years ago

This issue has come up because of the wrong initialization of label_smoothed_cross_entropy_latency_augmented.py which is not aligned with fairseq_criterion.py . The error is fixed in this commit. By changing the initialization of label_smoothed_cross_entropy_latency_augmented.py to this will fix this issue.


class LatencyAugmentedLabelSmoothedCrossEntropyCriterion(
    LabelSmoothedCrossEntropyCriterion
):
    def __init__(self,
        task,
        sentence_avg, 
        label_smoothing,
        ignore_prefix_size,
        report_accuracy,
        latency_weight_avg,
        latency_weight_avg_type,
        latency_weight_var,
        latency_weight_var_type,
        mass_preservation,
        average_method
        ):
        super().__init__(task, sentence_avg, label_smoothing, ignore_prefix_size, report_accuracy)
        self.eps = label_smoothing
        self.latency_weight_avg = latency_weight_avg
        self.latency_weight_avg_type = latency_weight_avg_type
        self.latency_weight_var = latency_weight_var
        self.latency_weight_var_type = latency_weight_var_type
        self.mass_preservation = mass_preservation
        self.average_method = average_method
        self.latency_train = LatencyTraining(
            self.latency_weight_avg,
            self.latency_weight_var,
            self.latency_weight_avg_type,
            self.latency_weight_var_type,
            self.mass_preservation,
            self.average_method,
        )
xpertasks commented 3 years ago

I am still having the issue NotImplementedError: Unable to infer Criterion arguments, please implement LatencyAugmentedLabelSmoothedCrossEntropyCriterion.build_criterion. Also, I am getting fairseq-train: error: unrecognized arguments: --label-smoothing 0.1.