Bugs and other major issues regarding the MMA code

troyas commented 3 years ago

🐛 Bug

Hello,

I have read MMA paper recently and I was hoping to play with the official code a little bit to replicate the results and also see how it performs. Unfortunately, that doesn't seem very possible. As it has already been stated by several issues such as #2122 (where multiple issues are pointed out by ppl), #3414, #3472 #3876, (one of them is even from May 2020), this code was broken for a very long time. I see that there is one recent commit that addresses some of the issues related to this model. Nevertheless, the code is still not functional because of the other errors. Besides all the reported issues above, below I share yet another unrecognized argument error I get when I run the code shared by the authors in the readme. I got this error on the latest fairseq version which includes MMA refactor commit.

To Reproduce

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

fairseq-train data-bin/wmt15_en_de_32k --simul-type hard_aligned --user-dir ~/fairseq/examples/simultaneous_translation --mass-preservation --criterion latency_augmented_label_smoothed_cross_entropy --latency-weight-var 0.1 --max-update 50000 --arch transformer_monotonic_iwslt_de_en save_dir_key=lambda --optimizer adam --adam-betas '(0.9, 0.98)' --lr-scheduler 'inverse_sqrt' --warmup-init-lr 1e-7 --warmup-updates 4000 --lr 5e-4 --stop-min-lr 1e-9 --clip-norm 0.0 --weight-decay 0.0001 --dropout 0.3 --label-smoothing 0.1 --max-tokens 3584

The above command generates following error message

  File "./bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "fairseq/fairseq_cli/train.py", line 493, in cli_main
    parser = options.get_training_parser()
  File "/fairseq/fairseq/options.py", line 38, in get_training_parser
    parser = get_parser("Trainer", default_task)
  File "fairseq/fairseq/options.py", line 227, in get_parser
    utils.import_user_module(usr_args)
  File "fairseq/fairseq/utils.py", line 501, in import_user_module
    import_models(models_path, f"{module_name}.models")
  File "fairseq/fairseq/models/__init__.py", line 218, in import_models
    importlib.import_module(namespace + "." + model_name)
  File "./lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "fairseq/examples/simultaneous_translation/models/convtransformer_simul_trans.py", line 29, in <module>
    class SimulConvTransformerModel(ConvTransformerModel):
  File "./fairseq/fairseq/models/__init__.py", line 132, in register_model_cls
    raise ValueError("Cannot register duplicate model ({})".format(name))
ValueError: Cannot register duplicate model (convtransformer_simul_trans)

As there is no information provided in the repository about what the --user-dir and save_dir_key parameters are used for , I decided to not to use them. As a result, I run the following command:

fairseq-train ~/data-bin/wmt15_de_en --simul-type hard_aligned \ --mass-preservation \ --criterion latency_augmented_label_smoothed_cross_entropy \ --latency-weight-var 0.1 \ --max-update 50000 \ --arch transformer_monotonic_iwslt_de_en \ --optimizer adam \ --adam-betas '(0.9, 0.98)' \ --lr-scheduler 'inverse_sqrt’ \ --warmup-init-lr 1e-7 \ --warmup-updates 4000 \ --lr 5e-4 \ --stop-min-lr 1e-9 \ --clip-norm 0.0 \ --weight-decay 0.0001 \ --dropout 0.3 \ --label-smoothing 0.1 \ --max-tokens 3584

This command gives me the following error: fairseq-train: error: unrecognized arguments: --latency-weight-var 0.1

In short, what I'd like to say is that such non-working codes make everyone life much harder. I'd even prefer not to have access the code rather than having a not working code. I bet many people have spend a fair amount of time trying to make this MMA code run and failed. Don't you, as authors @xutaima , @kahne, @MultiPath, think that the repository whose url shared in the paper should have a working code?

Just collecting other users who are having problems with this code: @felix-schneider, @shirley-wu @sathishreddy @xpertasks @Onoratofra @kurtisxx @VrutikShah

Expected behavior

Expected behavior from the command is to give results reported in the paper.

Environment

fairseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0) 1.9
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): source

Build command you used (if compiling from source):

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

Python version: 3.6.13

CUDA/cuDNN version: 10.0

Additional context

xutaima commented 3 years ago

Hi @troyas, please just tag me for the future issues on MMA. Please understand that the fairseq is a huge project and MMA is like an extension of it. Most of the issues are caused by updated API in fairseq models that we didn't follow up on time. I apologize for the inconvenience and we have been actively working on it.

mgaido91 commented 3 years ago

Hi @xutaima, I think you raised an important question, which I believe the fairseq community should start thinking seriously. There are many breaking changes that happen in fairseq and everyone who develops custom code on fairseq is basically compelled to derive a branch, because otherwise you spend your time debugging why your code does not work anymore or why there was a behavior change, although you did not do anything. This is true for you, who have to "follow up" (to cite you) and I really understand how hard this is. It is even harder when you are not part of the organisation that works on it (my case), as there is no documentation, notice, or anything similar regarding the breaking changes and behavior changes. It is good software engineering practice to avoid this kind of changes to make easy for people to build upon others' code, and to do breaking changes only in major releases, after deprecating old APIs and documenting the changes with indications on what was changed and how to adapt to the new status. I understand that fairseq may be maintained by a small group of people and that following strict, disciplined engineering practice may seem an excessive overhead, but I believe the benefits on the middle-long term would be much higher. The community would be encouraged to grow and many people also from outside would find easy to build their code on fairseq, keeping their version up to date with the fairseq branch, and hence being able and happy to contribute to the active maintenance of the library.

Of course, these are only my 2 cents on the topic, but I see great potential in fairseq, which is not fully exploited IMHO, and hope the fairseq community will consider reflecting on these topics and considering the adoption of stricter engineering practices in the code maintenance, as we have seen their importance in the success of other open source communities and software (see Apache project for instance).

All the bests, Marco

Psycoy commented 2 years ago

@troyas ValueError: Cannot register duplicate model (convtransformer_simul_trans)

This error happens when you register a model that has already existed in the fairseq pkg. Try to change your register name at the decorator: @register_model("yourmodel name")

EricLina commented 2 years ago

Oh my god , I met the same question just now , please get it fixed or give out a stronger fair-generate .

duj12 commented 2 years ago

@troyas fairseq-train: error: unrecognized arguments: --latency-weight-var 0.1

This type of error is easy to debug according to the error log:

After I compare the main-branch code and an older version of fairseq code, I find out the reason. It's because the "fairseq/criterions/latency_augmented_label_smoothed_cross_entropy" has been changed. In the former version, there is "latency_weight_var" option, but in the current version, this option become "latency_var_weight". You can change "--latency-weight-var 0.1" in the training code with " --latency-var-weight 0.1 ", this may fix your problem.

zhongmz commented 2 years ago

so, how to fix the bug?

facebookresearch / fairseq