Closed PolKul closed 3 years ago
These are commands that we added to handle our eembedding pre-processing. This step in the installing
bash scripts/download_and_patch_fairseq.sh
should have modified fairseq. Please double check that this is the case.
Thanks @ramon-astudillo, I had to reinstall fairseq stack-transformer. But now I have another error:
pavel@pavel-TRX40-DESIGNARE:~/work/nlp/transition-amr-parser$ bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh
stage-1: Preprocess
stage-2/3: Training/Testing (multiple seeds)
cp DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.rules.json DATA/AMR//models/amr2.0-cofill_o5+Word100_RoBERTa-large-top24_stnp6x6-seed42/
Saved fairseq model args to DATA/AMR//models/amr2.0-cofill_o5+Word100_RoBERTa-large-top24_stnp6x6-seed42//config.json
fairseq-train
DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/
--max-epoch 100
--arch stack_transformer_6x6_nopos
--optimizer adam
--adam-betas '(0.9,0.98)'
--clip-norm 0.0
--lr-scheduler inverse_sqrt
--warmup-init-lr 1e-07
--warmup-updates 4000
--pretrained-embed-dim 1024
--lr 0.0005
--min-lr 1e-09
--dropout 0.3
--weight-decay 0.0
--criterion label_smoothed_cross_entropy
--label-smoothing 0.01
--keep-last-epochs 40
--max-tokens 3584
--log-format json
--fp16
--distributed-world-size 1
--seed 42 --save-dir DATA/AMR//models/amr2.0-cofill_o5+Word100_RoBERTa-large-top24_stnp6x6-seed42/
Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas="'(0.9,0.98)'", adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, arch='stack_transformer_6x6_nopos', attention_dropout=0.0, bert_backprop=False, best_checkpoint_metric='loss', bpe=None, bucket_cap_mb=25, burnthrough=0, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', curriculum=0, data='DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/', dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=4, decoder_embed_dim=256, decoder_embed_path=None, decoder_ffn_embed_dim=512, decoder_input_dim=256, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=256, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.3, encode_state_machine='all-layers_nopos', encoder_attention_heads=4, encoder_embed_dim=256, encoder_embed_path=None, encoder_ffn_embed_dim=512, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, find_unused_parameters=False, fix_batches_to_gpus=False, fp16=True, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_interval_updates=-1, keep_last_epochs=40, label_smoothing=0.01, lazy_load=False, left_pad_source='True', left_pad_target='False', log_format='json', log_interval=1000, lr=[0.0005], lr_scheduler='inverse_sqrt', max_epoch=100, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=3584, max_tokens_valid=3584, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=1e-09, no_bert_precompute=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', pretrained_embed_dim=1024, raw_text=False, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='DATA/AMR//models/amr2.0-cofill_o5+Word100_RoBERTa-large-top24_stnp6x6-seed42/', save_interval=1, save_interval_updates=0, seed=42, sentence_avg=False, share_all_embeddings=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation', tbmf_wrapper=False, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, train_subset='train', update_freq=[1], upsample_primary=1, use_bmuf=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_init_lr=1e-07, warmup_updates=4000, weight_decay=0.0)
| [en] dictionary: 34104 types
| [actions] dictionary: 9456 types
| loaded 1368 examples from: DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/valid.en-actions.en
| loaded 1368 examples from: DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/valid.en-actions.actions
| DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/ valid en-actions 1368 examples
Traceback (most recent call last):
File "/home/pavel/.local/bin/fairseq-train", line 33, in <module>
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/train.py", line 338, in cli_main
main(args)
File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/train.py", line 46, in main
task.load_dataset(valid_sub_split, combine=False, epoch=0)
File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq/tasks/translation.py", line 236, in load_dataset
self.datasets[split] = load_langpair_dataset(
File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq/tasks/translation.py", line 119, in load_langpair_dataset
src_fixed_embeddings, src_fixed_embeddings.sizes,
AttributeError: 'NoneType' object has no attribute 'sizes'
One thing that happens often is that if you stop in the middle of extraction the folder exists but its empty and fairseq gets confused and loads a data class equal to None. Check if DATA/AMR//features/
exists and remove it.
Ok, removing that folder helped. The training went well for a while but then I've got another error:
pavel@pavel-TRX40-DESIGNARE:~/work/nlp/transition-amr-parser$ bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh
stage-1: Preprocess
fairseq-preprocess
--source-lang en
--target-lang actions
--trainpref DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train
--validpref DATA/AMR//oracles/amr2.0-cofill_o5+Word100//dev
--testpref DATA/AMR//oracles/amr2.0-cofill_o5+Word100//test
--destdir DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/
--workers 1
--pretrained-embed roberta.large
--bert-layers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
--machine-type AMR
--machine-rules DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.rules.json
--entity-rules DATA/AMR//oracles/amr2.0-cofill_o5+Word100//entity_rules.json
Namespace(alignfile=None, batch_normalize_reward=False, bert_layers=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='DATA/AMR//features/amr2.0-cofill_o5+Word100_RoBERTa-large-top24/', entity_rules='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//entity_rules.json', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gold_annotations=None, gold_episode_ratio=None, joined_dictionary=False, log_format=None, log_interval=1000, lr_scheduler='fixed', machine_rules='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.rules.json', machine_type='AMR', memory_efficient_fp16=False, min_loss_scale=0.0001, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer='nag', padding_factor=8, pretrained_embed='roberta.large', seed=1, source_lang='en', srcdict=None, target_lang='actions', task='translation', tbmf_wrapper=False, tensorboard_logdir='', testpref='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//test', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenize_by_whitespace=False, tokenizer=None, trainpref='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train', user_dir=None, validpref='DATA/AMR//oracles/amr2.0-cofill_o5+Word100//dev', workers=1)
| [en] Dictionary: 34103 types
| [en] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.en: 36521 sents, 689426 tokens, 0.0% replaced by <unk>
| [en] Dictionary: 34103 types
| [en] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//dev.en: 1368 sents, 30637 tokens, 0.0% replaced by <unk>
| [en] Dictionary: 34103 types
| [en] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//test.en: 1371 sents, 31425 tokens, 3.6% replaced by <unk>
| [actions] Dictionary: 9455 types
| [actions] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.actions: 36521 sents, 2766561 tokens, 0.0% replaced by <unk>
| [actions] Dictionary: 9455 types
| [actions] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//dev.actions: 1368 sents, 124508 tokens, 0.177% replaced by <unk>
| [actions] Dictionary: 9455 types
| [actions] DATA/AMR//oracles/amr2.0-cofill_o5+Word100//test.actions: 1371 sents, 129954 tokens, 0.159% replaced by <unk>
Using cache found in /home/pavel/.cache/torch/hub/pytorch_fairseq_master
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
running build_ext
/home/pavel/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:352: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
skipping 'fairseq/data/data_utils_fast.cpp' Cython extension (up-to-date)
skipping 'fairseq/data/token_block_utils_fast.cpp' Cython extension (up-to-date)
building 'fairseq.libbleu' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/fairseq
creating build/temp.linux-x86_64-3.8/fairseq/clib
creating build/temp.linux-x86_64-3.8/fairseq/clib/libbleu
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c fairseq/clib/libbleu/libbleu.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libbleu -D_GLIBCXX_USE_CXX11_ABI=0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c fairseq/clib/libbleu/module.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libbleu -D_GLIBCXX_USE_CXX11_ABI=0
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/fairseq
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o -o build/lib.linux-x86_64-3.8/fairseq/libbleu.cpython-38-x86_64-linux-gnu.so
building 'fairseq.data.data_utils_fast' extension
creating build/temp.linux-x86_64-3.8/fairseq/data
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include -I/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include -I/usr/include/python3.8 -c fairseq/data/data_utils_fast.cpp -o build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=data_utils_fast -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1944,
from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from fairseq/data/data_utils_fast.cpp:626:
/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with " \
| ^~~~~~~
creating build/lib.linux-x86_64-3.8/fairseq/data
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o -o build/lib.linux-x86_64-3.8/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so
building 'fairseq.data.token_block_utils_fast' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include -I/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include -I/usr/include/python3.8 -c fairseq/data/token_block_utils_fast.cpp -o build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=token_block_utils_fast -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1944,
from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from fairseq/data/token_block_utils_fast.cpp:627:
/home/pavel/.local/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with " \
| ^~~~~~~
fairseq/data/token_block_utils_fast.cpp: In function ‘PyArrayObject* __pyx_f_7fairseq_4data_22token_block_utils_fast__get_slice_indices_fast(PyArrayObject*, PyObject*, int, int, int)’:
fairseq/data/token_block_utils_fast.cpp:3310:36: warning: comparison of integer expressions of different signedness: ‘__pyx_t_7fairseq_4data_22token_block_utils_fast_DTYPE_t’ {aka ‘long int’} and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
3310 | __pyx_t_4 = ((__pyx_v_sz_idx < __pyx_t_10) != 0);
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~
fairseq/data/token_block_utils_fast.cpp:3505:36: warning: comparison of integer expressions of different signedness: ‘__pyx_t_7fairseq_4data_22token_block_utils_fast_DTYPE_t’ {aka ‘long int’} and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
3505 | __pyx_t_3 = ((__pyx_v_sz_idx < __pyx_t_10) != 0);
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o -o build/lib.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.cpython-38-x86_64-linux-gnu.so
building 'fairseq.libbase' extension
creating build/temp.linux-x86_64-3.8/fairseq/clib/libbase
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/pavel/.local/lib/python3.8/site-packages/torch/include -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/TH -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/THC -I/usr/include/python3.8 -c fairseq/clib/libbase/balanced_assignment.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libbase/balanced_assignment.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libbase -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/pavel/.local/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:12,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
from fairseq/clib/libbase/balanced_assignment.cpp:15:
/home/pavel/.local/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
84 | #pragma omp parallel for if ((end - begin) >= grain_size)
|
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/clib/libbase/balanced_assignment.o -L/home/pavel/.local/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/fairseq/libbase.cpython-38-x86_64-linux-gnu.so
building 'fairseq.libnat' extension
creating build/temp.linux-x86_64-3.8/fairseq/clib/libnat
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/pavel/.local/lib/python3.8/site-packages/torch/include -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/TH -I/home/pavel/.local/lib/python3.8/site-packages/torch/include/THC -I/usr/include/python3.8 -c fairseq/clib/libnat/edit_dist.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libnat -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/pavel/.local/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:12,
from /home/pavel/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
from fairseq/clib/libnat/edit_dist.cpp:9:
/home/pavel/.local/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
84 | #pragma omp parallel for if ((end - begin) >= grain_size)
|
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -L/home/pavel/.local/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/fairseq/libnat.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/fairseq/libbleu.cpython-38-x86_64-linux-gnu.so -> fairseq
copying build/lib.linux-x86_64-3.8/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so -> fairseq/data
copying build/lib.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.cpython-38-x86_64-linux-gnu.so -> fairseq/data
copying build/lib.linux-x86_64-3.8/fairseq/libbase.cpython-38-x86_64-linux-gnu.so -> fairseq
copying build/lib.linux-x86_64-3.8/fairseq/libnat.cpython-38-x86_64-linux-gnu.so -> fairseq
31%|███████▋ | 201607168/655283069 [02:14<04:57, 1524514.37B/s]100%|█████████████████████████| 655283069/655283069 [07:26<00:00, 1467635.48B/s]
loading archive file http://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz from cache at /home/pavel/.cache/torch/pytorch_fairseq/83e3a689e28e5e4696ecb0bbb05a77355444a5c8a3437e0f736d8a564e80035e.c687083d14776c1979f3f71654febb42f2bb3d9a94ff7ebdfe1ac6748dba89d2
extracting archive file /home/pavel/.cache/torch/pytorch_fairseq/83e3a689e28e5e4696ecb0bbb05a77355444a5c8a3437e0f736d8a564e80035e.c687083d14776c1979f3f71654febb42f2bb3d9a94ff7ebdfe1ac6748dba89d2 to temp dir /tmp/tmpc1nxlw6a
| dictionary: 50264 types
Using roberta.large extraction in GPU
9400 sentencesX: �
X: �
X: �
10300 sentencesX: �
X: �
10400 sentencesX: �
10500 sentencesX: �
10700 sentencesX: �
11700 sentencesX: �
X: �
19400 sentencesX: �
36500 sentences
Defaulting to user installation because normal site-packages is not writeable
Collecting en_core_web_sm==2.2.5
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
|████████████████████████████████| 12.0 MB 1.4 MB/s
Requirement already satisfied: spacy>=2.2.2 in /home/pavel/.local/lib/python3.8/site-packages (from en_core_web_sm==2.2.5) (2.2.3)
Requirement already satisfied: numpy>=1.15.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.20.1)
Requirement already satisfied: thinc<7.4.0,>=7.3.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (7.3.1)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.1.3)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.5)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.8.2)
Requirement already satisfied: setuptools in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (56.2.0)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.5)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.0)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.0.5)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.4.1)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/lib/python3/dist-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.22.0)
Requirement already satisfied: srsly<1.1.0,>=0.1.0 in /home/pavel/.local/lib/python3.8/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.5)
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in /home/pavel/.local/lib/python3.8/site-packages (from thinc<7.4.0,>=7.3.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (4.57.0)
Building wheels for collected packages: en-core-web-sm
Building wheel for en-core-web-sm (setup.py) ... done
Created wheel for en-core-web-sm: filename=en_core_web_sm-2.2.5-py3-none-any.whl size=12011738 sha256=59a95982d3b8970106e40bde27d533885ea2f29bf1b20fce6191516e9f2be88f
Stored in directory: /tmp/pip-ephem-wheel-cache-rmkyokz9/wheels/77/b4/c8/395804b9a2b6864aaff3623d7b709680acc3d04f47c8162ee6
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Attempting uninstall: en-core-web-sm
Found existing installation: en-core-web-sm 3.0.0
Uninstalling en-core-web-sm-3.0.0:
Successfully uninstalled en-core-web-sm-3.0.0
Successfully installed en-core-web-sm-2.2.5
WARNING: You are using pip version 21.0.1; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3.8 -m pip install --upgrade pip' command.
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/home/pavel/.local/lib/python3.8/site-packages/en_core_web_sm -->
/home/pavel/.local/lib/python3.8/site-packages/spacy/data/en
You can now load the model via spacy.load('en')
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/home/pavel/.local/bin/fairseq-preprocess", line 33, in <module>
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')())
File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/preprocess.py", line 295, in cli_main
main(args)
File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/preprocess.py", line 212, in main
make_state_machine(args, src_dict, tgt_dict, tokenize=tokenize)
File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 295, in make_state_machine
make_masks(args, tgt_dict, args.trainpref, "train", tgt_dict.eos_index, tgt_dict.pad_index, mask_predicates=True, tokenize=tokenize)
File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 283, in make_masks
make_binary_stack(args, target_vocab, input_prefix, output_prefix, eos_idx, pad_idx, mask_predicates=mask_predicates, allow_unk=allow_unk, tokenize=tokenize)
File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 173, in make_binary_stack
torch.Tensor(logits_mask).view(-1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
This looks like an installation problem. Never saw this error before. I would check Python (we used 3.6.9) and torch versions.
I was able to fix that error by changing the #code to the following:
torch.Tensor(logits_mask).contiguous().view(-1)
After that the training went well, but then the following error occurred:
Using roberta.large extraction in GPU
9400 sentencesX: �
X: �
X: �
10300 sentencesX: �
X: �
10400 sentencesX: �
10500 sentencesX: �
10700 sentencesX: �
11700 sentencesX: �
X: �
19400 sentencesX: �
36500 sentences
36521it [36:56, 16.48it/s]
There were missing actions
Counter({'LA(ARG1-of)': 120, 'RA(op1)': 103, 'RA(op2)': 38, 'LA(ARG0-of)': 22, 'RA(ARG1-of)': 15, 'LA(op1)': 14, 'RA(op3)': 9, 'RA(snt1)': 6, 'RA(ARG2-of)': 6, 'RA(ARG1)': 4, 'LA(poss)': 4, 'RA(snt2)': 3, 'RA(ARG0-of)': 3, 'LA(ARG2-of)': 3, 'LA(ARG3-of)': 3, 'RA(op4)': 2, 'RA(snt3)': 2, 'LA(polarity)': 1, 'LA(ARG3)': 1, 'RA(ARG2)': 1, 'PRED(":")': 1, 'LA(ARG1)': 1, 'PRED()': 1, 'RA(polarity)': 1, 'RA(example)': 1, 'LA(op3)': 1, 'LA(part-of)': 1, 'LA(root)': 1, 'RA(ARG0)': 1, 'RA(op5)': 1, 'LA(op4)': 1})
Using cache found in /home/pavel/.cache/torch/hub/pytorch_fairseq_master
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Traceback (most recent call last):
File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/hubconf.py", line 49, in <module>
import fairseq.data.token_block_utils_fast # noqa
ModuleNotFoundError: No module named 'fairseq.data.token_block_utils_fast'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 152, in save_modules
yield saved
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 193, in setup_context
yield
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 254, in run_setup
_execfile(setup_script, ns)
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 43, in _execfile
exec(code, globals, locals)
File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/setup.py", line 268, in <module>
do_setup(package_data)
File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/setup.py", line 179, in do_setup
setup(
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.8/distutils/core.py", line 134, in setup
ok = dist.parse_command_line()
File "/usr/lib/python3.8/distutils/dist.py", line 483, in parse_command_line
args = self._parse_command_opts(parser, args)
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/dist.py", line 957, in _parse_command_opts
nargs = _Distribution._parse_command_opts(self, parser, args)
File "/usr/lib/python3.8/distutils/dist.py", line 546, in _parse_command_opts
raise DistutilsClassError(
distutils.errors.DistutilsClassError: command class <class 'torch.utils.cpp_extension.BuildExtension'> must subclass Command
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pavel/.local/bin/fairseq-preprocess", line 33, in <module>
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')())
File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/preprocess.py", line 295, in cli_main
main(args)
File "/home/pavel/work/nlp/transition-amr-parser/fairseq-stack-transformer/fairseq_cli/preprocess.py", line 212, in main
make_state_machine(args, src_dict, tgt_dict, tokenize=tokenize)
File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 302, in make_state_machine
make_binary_bert_features(args, validpref, outprefix, src_dict.eos_index, src_dict.pad_index, tokenize)
File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/preprocess.py", line 212, in make_binary_bert_features
pretrained_embeddings = PretrainedEmbeddings(
File "/home/pavel/work/nlp/transition-amr-parser/transition_amr_parser/stack_transformer/pretrained_embeddings.py", line 180, in __init__
self.roberta = torch.hub.load('pytorch/fairseq', name)
File "/home/pavel/.local/lib/python3.8/site-packages/torch/hub.py", line 370, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/home/pavel/.local/lib/python3.8/site-packages/torch/hub.py", line 396, in _load_local
hub_module = import_module(MODULE_HUBCONF, hubconf_path)
File "/home/pavel/.local/lib/python3.8/site-packages/torch/hub.py", line 71, in import_module
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/hubconf.py", line 56, in <module>
sandbox.run_setup(
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 257, in run_setup
raise
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 193, in setup_context
yield
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 164, in save_modules
saved_exc.resume()
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 139, in resume
raise exc.with_traceback(self._tb)
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 152, in save_modules
yield saved
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 193, in setup_context
yield
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 254, in run_setup
_execfile(setup_script, ns)
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/sandbox.py", line 43, in _execfile
exec(code, globals, locals)
File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/setup.py", line 268, in <module>
do_setup(package_data)
File "/home/pavel/.cache/torch/hub/pytorch_fairseq_master/setup.py", line 179, in do_setup
setup(
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.8/distutils/core.py", line 134, in setup
ok = dist.parse_command_line()
File "/usr/lib/python3.8/distutils/dist.py", line 483, in parse_command_line
args = self._parse_command_opts(parser, args)
File "/home/pavel/.local/lib/python3.8/site-packages/setuptools/dist.py", line 957, in _parse_command_opts
nargs = _Distribution._parse_command_opts(self, parser, args)
File "/usr/lib/python3.8/distutils/dist.py", line 546, in _parse_command_opts
raise DistutilsClassError(
distutils.errors.DistutilsClassError: command class <class 'torch.utils.cpp_extension.BuildExtension'> must subclass Command
Also not familiar with this. Can you show me the output of
bash tests/correctly_installed.sh
?
the test script looks ok:
pytorch 1.7.1
cuda 10.2
Apex installed
Pytorch binaries were compiled with Cuda 10.2 but binary /usr/local/cuda/bin/nvcc is 11.2,
fairseq 0.7.2
spacy 2.2.3
[OK] correctly installed
The torch version in the installer is torch==1.1.0, you changed this manually right?. I think may be the source of the problem.
I've downgraded torch to 1.1.0 as suggested and now the training works. But it is training for more than a day already on my single Titan RTX GPU. What is the average training time?
I'd say 7h-8h on a single V100.
On Wed, May 19, 2021, 7:43 PM polkul @.***> wrote:
I've downgraded torch to 1.1.0 as suggested and now the training works. But it is training for more than a day already on my single Titan RTX GPU. What is the average training time?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IBM/transition-amr-parser/issues/9#issuecomment-844570402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK3OCJZPMEMMED4G7MALYDTOREIJANCNFSM443T3R4A .
the model has finally finished training. And it works! Thank you for your help with the setup.
I was able to install everything as per your setup instructions. I run the training script
bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh
The datasets for the training have been generated. But its getting stuck on runningfairseq-preproces
command. I've got the following error:fairseq-preprocess: error: unrecognized arguments: --pretrained-embed roberta.large --bert-layers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 --machine-type AMR --machine-rules DATA/AMR//oracles/amr2.0-cofill_o5+Word100//train.rules.json --entity-rules DATA/AMR//oracles/amr2.0-cofill_o5+Word100//entity_rules.json
Can you advise how to fix that?