Arij-Aladel commented 2 years ago

What is the problem here, please?

I was trying to run this baseline following the steps:

download data
set up baselines
preprocess data using the second command

python3 structured-uncertainty/preprocess.py --srcdict baseline-models/dict.en.txt --tgtdict baseline-models/dict.ru.txt --source-lang en --target-lang ru \ --trainpref wmt20_en_ru/train --validpref wmt20_en_ru/valid --testpref wmt20_en_ru/test19,wmt20_en_ru/reddit_dev \ --destdir data-bin/wmt20_en_ru --thresholdtgt 0 --thresholdsrc 0 --workers 24

no need for me to train now, so coming directly to running their baseline for example single base line I got this error : python3 structured-uncertainty//generate.py wmt20_en_ru/ --path baseline-models/model1.pt --max-tokens 4096 --remove-bpe --nbest 5 --gen-subset test

Namespace(no_progress_bar=False, log_interval=1000, log_format=None, tensorboard_logdir='', seed=1, cpu=False, fp16=False, memory_efficient_fp16=False, fp16_init_scale=128, fp16_scale_window=None, fp16_scale_tolerance=0.0, min_loss_scale=0.0001, threshold_loss_scale=None, user_dir=None, empty_cache_freq=0, all_gather_list_size=16384, criterion='cross_entropy', tokenizer=None, bpe=None, optimizer='nag', lr_scheduler='fixed', task='translation', num_workers=1, skip_invalid_size_inputs_valid_test=False, max_tokens=4096, max_sentences=None, required_batch_size_multiple=8, dataset_impl=None, gen_subset='test', num_shards=1, shard_id=0, path='baseline-models/model1.pt', remove_bpe='@@ ', quiet=False, model_overrides='{}', results_path=None, beam=5, nbest=5, max_len_a=0, max_len_b=200, min_len=1, match_source_len=False, no_early_stop=False, unnormalized=False, no_beamable_mm=False, lenpen=1, unkpen=0, replace_unk=None, sacrebleu=False, score_reference=False, compute_uncertainty=False, ensemble_sum_prod=False, prefix_size=0, no_repeat_ngram_size=0, sampling=False, sampling_topk=-1, sampling_topp=-1.0, temperature=1.0, diverse_beam_groups=-1, diverse_beam_strength=0.5, print_alignment=False, print_step=False, iter_decode_eos_penalty=0.0, iter_decode_max_iter=10, iter_decode_force_max_iter=False, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, retain_iter_history=False, decoding_format=None, momentum=0.99, weight_decay=0.0, force_anneal=None, lr_shrink=0.1, warmup_updates=0, data='wmt20_en_ru/', source_lang=None, target_lang=None, load_alignments=False, left_pad_source='True', left_pad_target='False', max_source_positions=1024, max_target_positions=1024, upsample_primary=1, truncate_source=False) Traceback (most recent call last): File "/cephfs/home/arij/structured-uncertainty//generate.py", line 330, in cli_main() File "/cephfs/home/arij/structured-uncertainty//generate.py", line 326, in cli_main main(args) File "/cephfs/home/arij/structured-uncertainty//generate.py", line 32, in main task = tasks.setup_task(args) File "/cephfs/home/arij/structured-uncertainty/fairseq/tasks/init.py", line 17, in setup_task return TASK_REGISTRY[args.task].setup_task(args, **kwargs) File "/cephfs/home/arij/structured-uncertainty/fairseq/tasks/translation.py", line 174, in setup_task raise Exception('Could not infer language pair, please provide it explicitly') Exception: Could not infer language pair, please provide it explicitly

after that, I have tried to provide another path for that dataset since processing data resulted in folder data-bin folder which includes wmt20_en_ru folder containing processed dataset. python3 structured-uncertainty//generate.py /home/arij/data-bin/wmt20_en_ru/ --path baseline-models/model1.pt --max-tokens 4096 --remove-bpe --nbest 5 --gen-subset test

and I got this error

Namespace(no_progress_bar=False, log_interval=1000, log_format=None, tensorboard_logdir='', seed=1, cpu=False, fp16=False, memory_efficient_fp16=False, fp16_init_scale=128, fp16_scale_window=None, fp16_scale_tolerance=0.0, min_loss_scale=0.0001, threshold_loss_scale=None, user_dir=None, empty_cache_freq=0, all_gather_list_size=16384, criterion='cross_entropy', tokenizer=None, bpe=None, optimizer='nag', lr_scheduler='fixed', task='translation', num_workers=1, skip_invalid_size_inputs_valid_test=False, max_tokens=4096, max_sentences=None, required_batch_size_multiple=8, dataset_impl=None, gen_subset='test', num_shards=1, shard_id=0, path='baseline-models/model1.pt', remove_bpe='@@ ', quiet=False, model_overrides='{}', results_path=None, beam=5, nbest=5, max_len_a=0, max_len_b=200, min_len=1, match_source_len=False, no_early_stop=False, unnormalized=False, no_beamable_mm=False, lenpen=1, unkpen=0, replace_unk=None, sacrebleu=False, score_reference=False, compute_uncertainty=False, ensemble_sum_prod=False, prefix_size=0, no_repeat_ngram_size=0, sampling=False, sampling_topk=-1, sampling_topp=-1.0, temperature=1.0, diverse_beam_groups=-1, diverse_beam_strength=0.5, print_alignment=False, print_step=False, iter_decode_eos_penalty=0.0, iter_decode_max_iter=10, iter_decode_force_max_iter=False, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, retain_iter_history=False, decoding_format=None, momentum=0.99, weight_decay=0.0, force_anneal=None, lr_shrink=0.1, warmup_updates=0, data='/home/arij/data-bin/wmt20_en_ru/', source_lang=None, target_lang=None, load_alignments=False, left_pad_source='True', left_pad_target='False', max_source_positions=1024, max_target_positions=1024, upsample_primary=1, truncate_source=False) | [en] dictionary: 43768 types | [ru] dictionary: 48272 types Traceback (most recent call last): File "/cephfs/home/arij/structured-uncertainty//generate.py", line 330, in cli_main() File "/cephfs/home/arij/structured-uncertainty//generate.py", line 326, in cli_main main(args) File "/cephfs/home/arij/structured-uncertainty//generate.py", line 33, in main task.load_dataset(args.gen_subset) File "/cephfs/home/arij/structured-uncertainty/fairseq/tasks/translation.py", line 200, in load_dataset self.datasets[split] = load_langpair_dataset( File "/cephfs/home/arij/structured-uncertainty/fairseq/tasks/translation.py", line 54, in load_langpair_dataset src_dataset = data_utils.load_indexed_dataset(prefix + src, src_dict, dataset_impl) File "/cephfs/home/arij/structured-uncertainty/fairseq/data/data_utils.py", line 73, in load_indexed_dataset dataset = indexed_dataset.make_dataset( File "/cephfs/home/arij/structured-uncertainty/fairseq/data/indexed_dataset.py", line 60, in make_dataset return MMapIndexedDataset(path) File "/cephfs/home/arij/structured-uncertainty/fairseq/data/indexed_dataset.py", line 448, in init self._do_init(path) File "/cephfs/home/arij/structured-uncertainty/fairseq/data/indexed_dataset.py", line 461, in _do_init self._bin_buffer_mmap = np.memmap(data_file_path(self._path), mode='r', order='C') File "/home/arij/anaconda3/envs/work/lib/python3.9/site-packages/numpy/core/memmap.py", line 264, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: cannot mmap an empty file Exception ignored in: <function MMapIndexedDataset.del at 0x7f1c5e2aff70> Traceback (most recent call last): File "/cephfs/home/arij/structured-uncertainty/fairseq/data/indexed_dataset.py", line 465, in del self._bin_buffer_mmap._mmap.close() AttributeError: 'MMapIndexedDataset' object has no attribute '_bin_buffer_mmap'

I have tried to ask the authors but according to them, this problem is not from their side. Need help to understand what is going on please, Thanks!

environment

fairseq Version 0.10.0:
PyTorch Version 1.9.0
OS ubuntu 20.04:
How you installed fairseq pip:
Build command you used (if compiling from source):
Python 3.9.5:
CUDA 11.0:
GPU A100-SXM4-40GB:

stale[bot] commented 2 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

nikhiljaiswal commented 2 years ago

any updates on this?

robotsp commented 2 years ago

did you fix it? @nikhiljaiswal

tjshu commented 2 years ago

any one know how to fix?

tjshu commented 2 years ago

I find that i did not install apex completely I have solved by reinstall cuda toolkit and pytorch (check the same version)

jiaohuix commented 2 years ago

i'v tried several times and found that this configuration works for me: python=3.8 pytorch==1.10.0 cuda=11.1 fairseq==0.10.0 gpu=3090

galapatt commented 1 year ago

The second error is not a problem with apex or pytorch. It is saying that one of your files is empty. If you go to line 264 in your memmap.py and add a "print(filename)". It should print out in terminal, which filename is throwing the error and you can solve your problem accordingly. My problem was one of my data files was missing, so it did not have the .bin file for one of the languages I was translating in my data-bin folder.

Munendra17 commented 1 year ago

i got the same error in NeMo Megatron model and this error comes due to apex version mismatch. so i installed it using below three commands and it works for me

git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

facebookresearch / fairseq

AttributeError: MMapIndexedDataset' object has no attribute '_bin_buffer_mmap #3903

What is the problem here, please?

environment