How XLM can be pretrained on other monolingual languages dataset and then be used for Unsupervised NMT.
I have preprocessed the data and then run this command:
I get the following error:
File "/content/drive/MyDrive/XLM/xlm/data/loader.py", line 26, in process_binarized
(data['sentences'].dtype == np.int32) and (1 << 16 <= len(dico) < 1 << 31))
AssertionError
How XLM can be pretrained on other monolingual languages dataset and then be used for Unsupervised NMT. I have preprocessed the data and then run this command:
!python train.py --exp_name test_sahi_mlm --dump_path ./dumped/ --data_path ./data/processed/sa-hi/ --lgs 'sa-hi' --clm_steps '' --mlm_steps 'sa,hi' --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --batch_size 32 --bptt 256 --optimizer adam,lr=0.0001 --epoch_size 200000 --validation_metrics _valid_mlm_ppl --stopping_criterion _valid_mlm_ppl,10 --fp16 true
I get the following error: File "/content/drive/MyDrive/XLM/xlm/data/loader.py", line 26, in process_binarized (data['sentences'].dtype == np.int32) and (1 << 16 <= len(dico) < 1 << 31)) AssertionError