There are some problems with applying your code of multilingual_nmt

jackeymango commented 4 years ago

First of all, thank you for open source your code, which has provided me a great help.But, in my practice,I find some problems,I really want your help. I would like to ask what do you mean by this code?In /multilingual_nmt/models/transformer.py at line 582(yy_mask = self.make_history_mask(y_in_block)),why use in variable,I can't understand this and run. Second, when I use your example,tools/bpe_pipeline_bilingual.sh to train en to ja model, I find the loss can't drop normally that the fist is 10 the second to be nan. I am looking forward to your reply,this is my email:4196065734@qq.com.

DevSinghSachan commented 4 years ago

In line 582, history mask is being generated. * is the multiplication operation that converts the self-attention mask of the target tokens into a corresponding history based mask which is required for decoding.
For Japanese language, one needs to first use a word segmentation tool such as KyTea tokenizer and then run the code. Look at this issue here for more details: https://github.com/DevSinghSachan/multilingual_nmt/issues/8

jackeymango commented 4 years ago

In line 582, history mask is being generated. * is the multiplication operation that converts the self-attention mask of the target tokens into a corresponding history based mask which is required for decoding.

For Japanese language, one needs to first use a word segmentation tool such as KyTea tokenizer and then run the code. Look at this issue here for more details: #8

Thank you for your repay.

In line 581,yy_mask was assigned as bool .So when I use this code, there are also problems.Since I can't use the GPU now, there is no way to reproduce this error statement, but just because the value of type bool cannot be multiplied, do I need to convert to type int? Thank you for the tool you provided. When I can use the GPU, I will experiment again to see if it can solve the problem that the loss cannot be reduced.

DevSinghSachan commented 4 years ago

Okay, I think you can change the type of the history mask to bool as: "history_mask.to(torch.bool)". So, now two bool matrices can be multiplied.

jackeymango commented 4 years ago

Okay, I think you can change the type of the history mask to bool as: "history_mask.to(torch.bool)". So, now two bool matrices can be multiplied.

Thanks,But the loss also can't drop normally in en to de, can you help me?

DevSinghSachan commented 4 years ago

okay, are you using float16? If so, try to remove the float16 option. Or could you paste your command line argument that you use to run the code.

jackeymango commented 4 years ago

okay, are you using float16? If so, try to remove the float16 option. Or could you paste your command line argument that you use to run the code.

I don't use float16,I just used bpe_pipeline_bilingual.sh from the example you provided.There are the args: "input": "temp/run_en_de/data", "data": "processed", "report_every": 50, "model": "Transformer", "pshare_decoder_param": false, "pshare_encoder_param": false, "lang1": null, "lang2": null, "share_sublayer": null, "attn_share": null, "batchsize": 30, "wbatchsize": 1000, "epoch": 90, "gpu": 0, "resume": false, "start_epoch": 0, "debug": false, "grad_accumulator_count": 1, "seed": 1234, "fp16": false, "static_loss_scale": 1, "dynamic_loss_scale": false, "multi_gpu": [ 0 ], "n_units": 512, "n_hidden": 2048, "layers": 6, "multi_heads": 8, "dropout": 0.1, "attention_dropout": 0.1, "relu_dropout": 0.1, "layer_prepostprocess_dropout": 0.1, "tied": true, "pos_attention": false, "label_smoothing": 0.1, "embed_position": false, "max_length": 500, "use_pad_remover": true, "optimizer": "Adam", "grad_norm_for_yogi": false, "warmup_steps": 16000, "learning_rate": 0.2, "learning_rate_constant": 2.0, "optimizer_adam_beta1": 0.9, "optimizer_adam_beta2": 0.997, "optimizer_adam_epsilon": 1e-09, "ema_decay": 0.999, "eval_steps": 100, "beam_size": 5, "metric": "bleu", "alpha": 1.0, "max_sent_eval": 500, "max_decode_len": 70, "out": "results", "model_file": "temp/run_en_de/models/model_run_en_de.ckpt", "best_model_file": "temp/run_en_de/models/model_best_run_en_de.ckpt", "dev_hyp": "temp/run_en_de/test/valid.out", "test_hyp": "temp/run_en_de/test/test.out", "log_path": "results/log.txt"

jackeymango commented 4 years ago

I have modified line 215 in transformer.py ,changed batch_A = batch_A.masked_fill(mask==0,float("inf")) # Works in v0.4. Does this affect the model?

DevSinghSachan commented 4 years ago

Thanks! Does this change make the model runs okay?

jackeymango commented 4 years ago

Thanks! Does this change make the model runs okay?

No! It can't runs okey.

jackeymango commented 4 years ago

I really hope you can give me some suggestions! thank you very much!

DevSinghSachan commented 4 years ago

I will try! Could you provide details of the Pytorch version, OS, and hardware used? I can see if I can use the same configurations to reproduce the issue.

jackeymango commented 4 years ago

Thanks，there are the args: (1)pytorch is under the official website, and the conda download statement is this: conda install pytorch torchvision cudatoolkit=9.2 -c pytorch (2)os:linux (3)Graphics card information:

DevSinghSachan commented 4 years ago

Thanks for the information! I have made a couple of changes to the code to make it compatible with pytorch v1.4. It works fine on my compute machine. Can you try the updated code?

jackeymango commented 4 years ago

Thanks for the information! I have made a couple of changes to the code to make it compatible with pytorch v1.4. It works fine on my compute machine. Can you try the updated code?

Thanks for your help!
Of course, where is the code you changed?

DevSinghSachan commented 4 years ago

It's in the master branch. See the latest commit.

jackeymango commented 4 years ago

OK! I will try!

jackeymango commented 4 years ago

Thanks，I can run this! But when I run the bpe_pipeline_MT.sh ,there is a problem:

Traceback (most recent call last): File "/work/zhangzhongze/multilingual_nmt/train.py", line 457, in main() File "/work/zhangzhongze/multilingual_nmt/train.py", line 381, in main max_sent=args.max_sent_eval)(logger) File "/work/zhangzhongze/multilingual_nmt/train.py", line 133, in init self.model = unwrap(model) File "/work/zhangzhongze/multilingual_nmt/train.py", line 127, in unwrap return unwrap(module.module, model_name) File "/home/zhangzhongze/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 585, in getattr type(self).name, name)) AttributeError: 'MultiTaskNMT' object has no attribute 'module'

jackeymango commented 4 years ago

I can't understand this in train.py 217 line.

DevSinghSachan commented 4 years ago

It seems like your code might be using multi-gpus? Can you force the code to use one-gpu and try?

jackeymango commented 4 years ago

I just use one-gpu!

jackeymango commented 4 years ago

When I use this code，BLEU just 2.Thanks for your help!

jackeymango commented 4 years ago

I'm here to ask questions again. When I change the gpu to 1 card, an error will be reported. Troubled me for a long time, thank you for your help

jackeymango commented 4 years ago

DevSinghSachan / multilingual_nmt

There are some problems with applying your code of multilingual_nmt #9