problem with fairseq-preprocess

PolKul commented 3 years ago

I was able to install everything as per your setup instructions. I run the training script bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh The datasets for the training have been generated. But its getting stuck on running fairseq-preproces command. I've got the following error:

fairseq-preprocess: error: unrecognized arguments: --pretrained-embed roberta.large --bert-layers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 --machine-type AMR --machine-rules DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.rules.json --entity-rules DATA/AMR//oracles/amr2.0-cofill_o5+Word100//entity_rules.json

Can you advise how to fix that?

ramon-astudillo commented 3 years ago

These are commands that we added to handle our eembedding pre-processing. This step in the installing

bash scripts/download_and_patch_fairseq.sh

should have modified fairseq. Please double check that this is the case.

PolKul commented 3 years ago

Thanks @ramon-astudillo, I had to reinstall fairseq stack-transformer. But now I have another error:

pavel@pavel-TRX40-DESIGNARE:~/work/nlp/transition-amr-parser$ bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh
stage-1: Preprocess
stage-2/3: Training/Testing (multiple seeds)
cp DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.rules.json DATA/AMR//models/amr2.0-cofill_o5+Word100_RoBERTa-large-top24_stnp6x6-seed42/
Saved fairseq model args to DATA/AMR//models/amr2.0-cofill_o5+Word100_RoBERTa-large-top24_stnp6x6-seed42//config.json
fairseq-train 
    DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/
    --max-epoch 100
    --arch stack_transformer_6x6_nopos
    --optimizer adam
    --adam-betas '(0.9,0.98)'
    --clip-norm 0.0
    --lr-scheduler inverse_sqrt
    --warmup-init-lr 1e-07
    --warmup-updates 4000
    --pretrained-embed-dim 1024
    --lr 0.0005
    --min-lr 1e-09
    --dropout 0.3
    --weight-decay 0.0
    --criterion label_smoothed_cross_entropy
    --label-smoothing 0.01
    --keep-last-epochs 40
    --max-tokens 3584
    --log-format json
    --fp16
    --distributed-world-size 1
 --seed 42 --save-dir DATA/AMR//models/amr2.0-cofill_o5+Word100_RoBERTa-large-top24_stnp6x6-seed42/
Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas="'(0.9,0.98)'", adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, arch='stack_transformer_6x6_nopos', attention_dropout=0.0, bert_backprop=False, best_checkpoint_metric='loss', bpe=None, bucket_cap_mb=25, burnthrough=0, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', curriculum=0, data='DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/', dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=4, decoder_embed_dim=256, decoder_embed_path=None, decoder_ffn_embed_dim=512, decoder_input_dim=256, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=256, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.3, encode_state_machine='all-layers_nopos', encoder_attention_heads=4, encoder_embed_dim=256, encoder_embed_path=None, encoder_ffn_embed_dim=512, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, find_unused_parameters=False, fix_batches_to_gpus=False, fp16=True, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_interval_updates=-1, keep_last_epochs=40, label_smoothing=0.01, lazy_load=False, left_pad_source='True', left_pad_target='False', log_format='json', log_interval=1000, lr=[0.0005], lr_scheduler='inverse_sqrt', max_epoch=100, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=3584, max_tokens_valid=3584, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=1e-09, no_bert_precompute=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', pretrained_embed_dim=1024, raw_text=False, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='DATA/AMR//models/amr2.0-cofill_o5+Word100_RoBERTa-large-top24_stnp6x6-seed42/', save_interval=1, save_interval_updates=0, seed=42, sentence_avg=False, share_all_embeddings=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation', tbmf_wrapper=False, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, train_subset='train', update_freq=[1], upsample_primary=1, use_bmuf=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_init_lr=1e-07, warmup_updates=4000, weight_decay=0.0)
| [en] dictionary: 34104 types
| [actions] dictionary: 9456 types
| loaded 1368 examples from: DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/valid.en-actions.en
| loaded 1368 examples from: DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/valid.en-actions.actions
| DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/ valid en-actions 1368 examples
Traceback (most recent call last):
  File "/home/pavel/.local/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/train.py", line 338, in cli_main
    main(args)
  File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/train.py", line 46, in main
    task.load_dataset(valid_sub_split, combine=False, epoch=0)
  File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq/tasks/translation.py", line 236, in load_dataset
    self.datasets[split] = load_langpair_dataset(
  File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq/tasks/translation.py", line 119, in load_langpair_dataset
    src_fixed_embeddings, src_fixed_embeddings.sizes,
AttributeError: 'NoneType' object has no attribute 'sizes'

ramon-astudillo commented 3 years ago

One thing that happens often is that if you stop in the middle of extraction the folder exists but its empty and fairseq gets confused and loads a data class equal to None. Check if DATA/AMR//features/ exists and remove it.

PolKul commented 3 years ago

Ok, removing that folder helped. The training went well for a while but then I've got another error:

pavel@pavel-TRX40-DESIGNARE:~/work/nlp/transition-amr-parser$ bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh
stage-1: Preprocess
fairseq-preprocess 
    --source-lang en
    --target-lang actions
    --trainpref DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train
    --validpref DATA/AMR//oracles/amr2.0-cofill_o5+Word100//dev
    --testpref DATA/AMR//oracles/amr2.0-cofill_o5+Word100//test
    --destdir DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/
    --workers 1
    --pretrained-embed roberta.large
    --bert-layers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
    --machine-type AMR 
    --machine-rules DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.rules.json 
    --entity-rules DATA/AMR//oracles/amr2.0-cofill_o5+Word100//entity_rules.json

Namespace(alignfile=None, batch_normalize_reward=False, bert_layers=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/', entity_rules='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//entity_rules.json', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gold_annotations=None, gold_episode_ratio=None, joined_dictionary=False, log_format=None, log_interval=1000, lr_scheduler='fixed', machine_rules='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.rules.json', machine_type='AMR', memory_efficient_fp16=False, min_loss_scale=0.0001, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer='nag', padding_factor=8, pretrained_embed='roberta.large', seed=1, source_lang='en', srcdict=None, target_lang='actions', task='translation', tbmf_wrapper=False, tensorboard_logdir='', testpref='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//test', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenize_by_whitespace=False, tokenizer=None, trainpref='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train', user_dir=None, validpref='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//dev', workers=1)
| [en] Dictionary: 34103 types
| [en] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.en: 36521 sents, 689426 tokens, 0.0% replaced by <unk>
| [en] Dictionary: 34103 types
| [en] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//dev.en: 1368 sents, 30637 tokens, 0.0% replaced by <unk>
| [en] Dictionary: 34103 types
| [en] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//test.en: 1371 sents, 31425 tokens, 3.6% replaced by <unk>
| [actions] Dictionary: 9455 types
| [actions] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.actions: 36521 sents, 2766561 tokens, 0.0% replaced by <unk>
| [actions] Dictionary: 9455 types
| [actions] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//dev.actions: 1368 sents, 124508 tokens, 0.177% replaced by <unk>
| [actions] Dictionary: 9455 types
| [actions] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//test.actions: 1371 sents, 129954 tokens, 0.159% replaced by <unk>
Using cache found in /home/pavel/.cache/torch/hub/pytorch_fairseq_master
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
running build_ext
/home/pavel/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:352: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
skipping 'fairseq/data/data_utils_fast.cpp' Cython extension (up-to-date)
skipping 'fairseq/data/token_block_utils_fast.cpp' Cython extension (up-to-date)
building 'fairseq.libbleu' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/fairseq
creating build/temp.linux-x86_64-3.8/fairseq/clib
creating build/temp.linux-x86_64-3.8/fairseq/clib/libbleu
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c fairseq/clib/libbleu/libbleu.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libbleu -D_GLIBCXX_USE_CXX11_ABI=0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c fairseq/clib/libbleu/module.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libbleu -D_GLIBCXX_USE_CXX11_ABI=0
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/fairseq
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o -o build/lib.linux-x86_64-3.8/fairseq/libbleu.cpython-38-x86_64-linux-gnu.so
building 'fairseq.data.data_utils_fast' extension
creating build/temp.linux-x86_64-3.8/fairseq/data
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include -I/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include -I/usr/include/python3.8 -c fairseq/data/data_utils_fast.cpp -o build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=data_utils_fast -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1944,
                 from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from fairseq/data/data_utils_fast.cpp:626:
/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
   17 | #warning "Using deprecated NumPy API, disable it with " \
      |  ^~~~~~~
creating build/lib.linux-x86_64-3.8/fairseq/data
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o -o build/lib.linux-x86_64-3.8/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so
building 'fairseq.data.token_block_utils_fast' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include -I/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include -I/usr/include/python3.8 -c fairseq/data/token_block_utils_fast.cpp -o build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=token_block_utils_fast -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1944,
                 from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from fairseq/data/token_block_utils_fast.cpp:627:
/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
   17 | #warning "Using deprecated NumPy API, disable it with " \
      |  ^~~~~~~
fairseq/data/token_block_utils_fast.cpp: In function ‘PyArrayObject* __pyx_f_7fairseq_4data_22token_block_utils_fast__get_slice_indices_fast(PyArrayObject*, PyObject*, int, int, int)’:
fairseq/data/token_block_utils_fast.cpp:3310:36: warning: comparison of integer expressions of different signedness: ‘__pyx_t_7fairseq_4data_22token_block_utils_fast_DTYPE_t’ {aka ‘long int’} and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
 3310 |       __pyx_t_4 = ((__pyx_v_sz_idx < __pyx_t_10) != 0);
      |                     ~~~~~~~~~~~~~~~^~~~~~~~~~~~
fairseq/data/token_block_utils_fast.cpp:3505:36: warning: comparison of integer expressions of different signedness: ‘__pyx_t_7fairseq_4data_22token_block_utils_fast_DTYPE_t’ {aka ‘long int’} and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
 3505 |       __pyx_t_3 = ((__pyx_v_sz_idx < __pyx_t_10) != 0);
      |                     ~~~~~~~~~~~~~~~^~~~~~~~~~~~
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o -o build/lib.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.cpython-38-x86_64-linux-gnu.so
building 'fairseq.libbase' extension
creating build/temp.linux-x86_64-3.8/fairseq/clib/libbase
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/pavel/.local/lib/python3.8/site-packages/torch/include -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/TH -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/THC -I/usr/include/python3.8 -c fairseq/clib/libbase/balanced_assignment.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libbase/balanced_assignment.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libbase -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/pavel/.local/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:12,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from fairseq/clib/libbase/balanced_assignment.cpp:15:
/home/pavel/.local/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/clib/libbase/balanced_assignment.o -L/home/pavel/.local/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/fairseq/libbase.cpython-38-x86_64-linux-gnu.so
building 'fairseq.libnat' extension
creating build/temp.linux-x86_64-3.8/fairseq/clib/libnat
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/pavel/.local/lib/python3.8/site-packages/torch/include -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/TH -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/THC -I/usr/include/python3.8 -c fairseq/clib/libnat/edit_dist.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libnat -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/pavel/.local/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:12,
                 from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
                 from fairseq/clib/libnat/edit_dist.cpp:9:
/home/pavel/.local/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -L/home/pavel/.local/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/fairseq/libnat.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/fairseq/libbleu.cpython-38-x86_64-linux-gnu.so -> fairseq
copying build/lib.linux-x86_64-3.8/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so -> fairseq/data
copying build/lib.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.cpython-38-x86_64-linux-gnu.so -> fairseq/data
copying build/lib.linux-x86_64-3.8/fairseq/libbase.cpython-38-x86_64-linux-gnu.so -> fairseq
copying build/lib.linux-x86_64-3.8/fairseq/libnat.cpython-38-x86_64-linux-gnu.so -> fairseq
 31%|███████▋                 | 201607168/655283069 [02:14<04:57, 1524514.37B/s]100%|█████████████████████████| 655283069/655283069 [07:26<00:00, 1467635.48B/s]
loading archive file http://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz from cache at /home/pavel/.cache/torch/pytorch_fairseq/83e3a689e28e5e4696ecb0bbb05a77355444a5c8a3437e0f736d8a564e80035e.c687083d14776c1979f3f71654febb42f2bb3d9a94ff7ebdfe1ac6748dba89d2
extracting archive file /home/pavel/.cache/torch/pytorch_fairseq/83e3a689e28e5e4696ecb0bbb05a77355444a5c8a3437e0f736d8a564e80035e.c687083d14776c1979f3f71654febb42f2bb3d9a94ff7ebdfe1ac6748dba89d2 to temp dir /tmp/tmpc1nxlw6a
| dictionary: 50264 types
Using roberta.large extraction in GPU
9400 sentencesX:   �
X:   �
X:   �
10300 sentencesX:   �
X:   �
10400 sentencesX:   �
10500 sentencesX:   �
10700 sentencesX:   �
11700 sentencesX:   �
X:   �
19400 sentencesX:   �
36500 sentences
Defaulting to user installation because normal site-packages is not writeable
Collecting en_core_web_sm==2.2.5
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
     |████████████████████████████████| 12.0 MB 1.4 MB/s 
Requirement already satisfied: spacy>=2.2.2 in /home/pavel/.local/lib/python3.8/site-packages (from en_core_web_sm==2.2.5) (2.2.3)
Requirement already satisfied: numpy>=1.15.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.20.1)
Requirement already satisfied: thinc<7.4.0,>=7.3.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (7.3.1)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.1.3)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.5)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.8.2)
Requirement already satisfied: setuptools in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (56.2.0)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.5)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.0)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.0.5)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.4.1)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/lib/python3/dist-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.22.0)
Requirement already satisfied: srsly<1.1.0,>=0.1.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.5)
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in /home/pavel/.local/lib/python3.8/site-packages (from thinc<7.4.0,>=7.3.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (4.57.0)
Building wheels for collected packages: en-core-web-sm
  Building wheel for en-core-web-sm (setup.py) ... done
  Created wheel for en-core-web-sm: filename=en_core_web_sm-2.2.5-py3-none-any.whl size=12011738 sha256=59a95982d3b8970106e40bde27d533885ea2f29bf1b20fce6191516e9f2be88f
  Stored in directory: /tmp/pip-ephem-wheel-cache-rmkyokz9/wheels/77/b4/c8/395804b9a2b6864aaff3623d7b709680acc3d04f47c8162ee6
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
  Attempting uninstall: en-core-web-sm
    Found existing installation: en-core-web-sm 3.0.0
    Uninstalling en-core-web-sm-3.0.0:
      Successfully uninstalled en-core-web-sm-3.0.0
Successfully installed en-core-web-sm-2.2.5
WARNING: You are using pip version 21.0.1; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3.8 -m pip install --upgrade pip' command.
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/home/pavel/.local/lib/python3.8/site-packages/en_core_web_sm -->
/home/pavel/.local/lib/python3.8/site-packages/spacy/data/en
You can now load the model via spacy.load('en')
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/home/pavel/.local/bin/fairseq-preprocess", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')())
  File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/preprocess.py", line 295, in cli_main
    main(args)
  File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/preprocess.py", line 212, in main
    make_state_machine(args, src_dict, tgt_dict, tokenize=tokenize)
  File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 295, in make_state_machine
    make_masks(args, tgt_dict, args.trainpref, "train", tgt_dict.eos_index, tgt_dict.pad_index, mask_predicates=True, tokenize=tokenize)
  File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 283, in make_masks
    make_binary_stack(args, target_vocab, input_prefix, output_prefix, eos_idx, pad_idx, mask_predicates=mask_predicates, allow_unk=allow_unk, tokenize=tokenize)
  File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 173, in make_binary_stack
    torch.Tensor(logits_mask).view(-1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

ramon-astudillo commented 3 years ago

This looks like an installation problem. Never saw this error before. I would check Python (we used 3.6.9) and torch versions.

PolKul commented 3 years ago

I was able to fix that error by changing the #code to the following: torch.Tensor(logits_mask).contiguous().view(-1)

PolKul commented 3 years ago

After that the training went well, but then the following error occurred:

Using roberta.large extraction in GPU
9400 sentencesX:   �
X:   �
X:   �
10300 sentencesX:   �
X:   �
10400 sentencesX:   �
10500 sentencesX:   �
10700 sentencesX:   �
11700 sentencesX:   �
X:   �
19400 sentencesX:   �
36500 sentences
36521it [36:56, 16.48it/s]

There were missing actions
Counter({'LA(ARG1-of)': 120, 'RA(op1)': 103, 'RA(op2)': 38, 'LA(ARG0-of)': 22, 'RA(ARG1-of)': 15, 'LA(op1)': 14, 'RA(op3)': 9, 'RA(snt1)': 6, 'RA(ARG2-of)': 6, 'RA(ARG1)': 4, 'LA(poss)': 4, 'RA(snt2)': 3, 'RA(ARG0-of)': 3, 'LA(ARG2-of)': 3, 'LA(ARG3-of)': 3, 'RA(op4)': 2, 'RA(snt3)': 2, 'LA(polarity)': 1, 'LA(ARG3)': 1, 'RA(ARG2)': 1, 'PRED(":")': 1, 'LA(ARG1)': 1, 'PRED()': 1, 'RA(polarity)': 1, 'RA(example)': 1, 'LA(op3)': 1, 'LA(part-of)': 1, 'LA(root)': 1, 'RA(ARG0)': 1, 'RA(op5)': 1, 'LA(op4)': 1})
Using cache found in /home/pavel/.cache/torch/hub/pytorch_fairseq_master
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Traceback (most recent call last):
  File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/hubconf.py", line 49, in <module>
    import fairseq.data.token_block_utils_fast  # noqa
ModuleNotFoundError: No module named 'fairseq.data.token_block_utils_fast'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 152, in save_modules
    yield saved
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 193, in setup_context
    yield
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 254, in run_setup
    _execfile(setup_script, ns)
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 43, in _execfile
    exec(code, globals, locals)
  File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/setup.py", line 268, in <module>
    do_setup(package_data)
  File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/setup.py", line 179, in do_setup
    setup(
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.8/distutils/core.py", line 134, in setup
    ok = dist.parse_command_line()
  File "/usr/lib/python3.8/distutils/dist.py", line 483, in parse_command_line
    args = self._parse_command_opts(parser, args)
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/dist.py", line 957, in _parse_command_opts
    nargs = _Distribution._parse_command_opts(self, parser, args)
  File "/usr/lib/python3.8/distutils/dist.py", line 546, in _parse_command_opts
    raise DistutilsClassError(
distutils.errors.DistutilsClassError: command class <class 'torch.utils.cpp_extension.BuildExtension'> must subclass Command

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pavel/.local/bin/fairseq-preprocess", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')())
  File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/preprocess.py", line 295, in cli_main
    main(args)
  File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/preprocess.py", line 212, in main
    make_state_machine(args, src_dict, tgt_dict, tokenize=tokenize)
  File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 302, in make_state_machine
    make_binary_bert_features(args, validpref, outprefix, src_dict.eos_index, src_dict.pad_index, tokenize)
  File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 212, in make_binary_bert_features
    pretrained_embeddings = PretrainedEmbeddings(
  File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/pretrained_embeddings.py", line 180, in __init__
    self.roberta = torch.hub.load('pytorch/fairseq', name)
  File "/home/pavel/.local/lib/python3.8/site-packages/torch/hub.py", line 370, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/home/pavel/.local/lib/python3.8/site-packages/torch/hub.py", line 396, in _load_local
    hub_module = import_module(MODULE_HUBCONF, hubconf_path)
  File "/home/pavel/.local/lib/python3.8/site-packages/torch/hub.py", line 71, in import_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/hubconf.py", line 56, in <module>
    sandbox.run_setup(
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 257, in run_setup
    raise
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 193, in setup_context
    yield
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 164, in save_modules
    saved_exc.resume()
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 139, in resume
    raise exc.with_traceback(self._tb)
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 152, in save_modules
    yield saved
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 193, in setup_context
    yield
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 254, in run_setup
    _execfile(setup_script, ns)
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 43, in _execfile
    exec(code, globals, locals)
  File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/setup.py", line 268, in <module>
    do_setup(package_data)
  File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/setup.py", line 179, in do_setup
    setup(
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.8/distutils/core.py", line 134, in setup
    ok = dist.parse_command_line()
  File "/usr/lib/python3.8/distutils/dist.py", line 483, in parse_command_line
    args = self._parse_command_opts(parser, args)
  File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/dist.py", line 957, in _parse_command_opts
    nargs = _Distribution._parse_command_opts(self, parser, args)
  File "/usr/lib/python3.8/distutils/dist.py", line 546, in _parse_command_opts
    raise DistutilsClassError(
distutils.errors.DistutilsClassError: command class <class 'torch.utils.cpp_extension.BuildExtension'> must subclass Command

ramon-astudillo commented 3 years ago

Also not familiar with this. Can you show me the output of

bash tests/correctly_installed.sh

?

PolKul commented 3 years ago

the test script looks ok:

pytorch 1.7.1
cuda 10.2
Apex installed
Pytorch binaries were compiled with Cuda 10.2 but binary /usr/local/cuda/bin/nvcc is 11.2,
fairseq 0.7.2
spacy 2.2.3
[OK] correctly installed

ramon-astudillo commented 3 years ago

The torch version in the installer is torch==1.1.0, you changed this manually right?. I think may be the source of the problem.

PolKul commented 3 years ago

I've downgraded torch to 1.1.0 as suggested and now the training works. But it is training for more than a day already on my single Titan RTX GPU. What is the average training time?

ramon-astudillo commented 3 years ago

I'd say 7h-8h on a single V100.

On Wed, May 19, 2021, 7:43 PM polkul @.***> wrote:

I've downgraded torch to 1.1.0 as suggested and now the training works. But it is training for more than a day already on my single Titan RTX GPU. What is the average training time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IBM/transition-amr-parser/issues/9#issuecomment-844570402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK3OCJZPMEMMED4G7MALYDTOREIJANCNFSM443T3R4A .

PolKul commented 3 years ago

the model has finally finished training. And it works! Thank you for your help with the setup.

IBM / transition-amr-parser

problem with fairseq-preprocess #9