RayeRen / multilingual-kd-pytorch

ICLR2019, Multilingual Neural Machine Translation with Knowledge Distillation
Other
70 stars 18 forks source link

Error while training Student model #2

Closed sugeeth14 closed 5 years ago

sugeeth14 commented 5 years ago

Hi, I followed the steps mentioned here and then here and could train successfully the teacher models and save the topk probabilities and checkpoints for the individual languages . But when coming to training the student model as mentioned in Train Multilingual Student I get the below error.

| Redis disabled...
| Redis disabled...
| Redis disabled...
| Redis disabled...
| Redis disabled...
| distributed init (rank 2): tcp://localhost:17787
| distributed init (rank 1): tcp://localhost:17787
| distributed init (rank 0): tcp://localhost:17787
| distributed init (rank 3): tcp://localhost:17787
Traceback (most recent call last):
  File "train.py", line 506, in <module>
    multiprocessing_main(args)
  File "Sugeeth/plain_multilingual/multilingual-kd-pytorch/multiprocessing_train.py", line 42, in main
    p.join()
  File "miniconda3/envs/myenv_s/lib/python3.6/multiprocessing/process.py", line 124, in join
    res = self._popen.wait(timeout)
  File "miniconda3/envs/myenv_s/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "miniconda3/envs/myenv_s/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "Sugeeth/plain_multilingual/multilingual-kd-pytorch/multiprocessing_train.py", line 84, in signal_handler
    raise Exception(msg)
Exception: 

-- Tracebacks above this line can probably be ignored --

Traceback (most recent call last):
  File "Sugeeth/plain_multilingual/multilingual-kd-pytorch/multiprocessing_train.py", line 48, in run
    single_process_main(args)
  File "Sugeeth/plain_multilingual/multilingual-kd-pytorch/train.py", line 42, in main
    load_dataset_splits(task, ['train', 'valid'])
  File "Sugeeth/plain_multilingual/multilingual-kd-pytorch/train.py", line 470, in load_dataset_splits
    task.load_dataset(split, combine=True)
  File "Sugeeth/plain_multilingual/multilingual-kd-pytorch/fairseq/tasks/universal_translation.py", line 187, in load_dataset
    src_dataset = ConcatDataset(src_datasets)
  File "Sugeeth/plain_multilingual/multilingual-kd-pytorch/fairseq/data/concat_dataset.py", line 20, in __init__
    assert len(datasets) > 0, 'datasets should not be an empty iterable'
AssertionError: datasets should not be an empty iterable

What files does the command expect to be passed am I missing something . Kindly help. Thank you.

alphadl commented 5 years ago

Same question with you~ waiting for the author response

sugeeth14 commented 5 years ago

Hi @alphadl in the prepare-iwslt14.sh folders are created to store text data. So similar folder must be created and add train and test data there.

Ir1d commented 4 years ago

hi @Raghava14. How did you solve this? I'm getting the same error. What do you mean by a similar folder of train and test text data?

Ir1d commented 4 years ago

@RayeRen Could you please kindly guide me on this? Thanks.

Ir1d commented 4 years ago

I debugged for a while and I think its related to the filenames. The teacher generated things like _de_topk_idx.bin and renaming to en_de_topk_idx.bin solves the problem.

@RayeRen Hi, I'm getting a lot of UserWarning: The number of elements in the out tensor of shape [150] is 150 which does not match the computed number of elements 149. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (149,), is this normal behavior?

sunmingyang1994 commented 4 years ago

@Ir1d Hi, how did the teacher generate topk bin files? Why doesn't my teachers network work?

Ir1d commented 4 years ago

@sunmingyang1994 run the command in the readme and the bin files will be generated in the data folders