facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.48k stars 6.4k forks source link

fairseq-train: error: unrecognized arguments: --mask-multiple-length 10 --mask-stdev 10 #3229

Open cpark-dev opened 3 years ago

cpark-dev commented 3 years ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd
    fairseq-train --fp16 $RESULT/quantized/fairseq-bin-data \
    --task masked_lm --criterion masked_lm \
    --save-dir $CHECKPOINT/BERT_CPC_big_kmeans50 \
    --keep-last-epochs 1 \
    --train-subset train \
    --num-workers 4 \
    --arch roberta_base \
    --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \
    --lr-scheduler polynomial_decay --lr 0.0005 --total-num-update 250000 --warmup-updates 10000 \
    --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
    --mask-multiple-length 10 --mask-prob 0.5 --mask-stdev 10 \
    --sample-break-mode eos --tokens-per-sample 3072 --max-positions 6144 \
    --max-tokens 4096 --update-freq 4 --max-update 250000 \
    --seed 5 --log-format simple --log-interval 10 --skip-invalid-size-inputs-valid-test
  2. See error
    usage: fairseq-train [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL]
                     [--log-format {json,none,simple,tqdm}]
                     [--tensorboard-logdir TENSORBOARD_LOGDIR] [--seed SEED]
                     [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16]
                     [--fp16] [--memory-efficient-fp16]
                     [--fp16-no-flatten-grads]
                     [--fp16-init-scale FP16_INIT_SCALE]
                     [--fp16-scale-window FP16_SCALE_WINDOW]
                     [--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
                     [--min-loss-scale MIN_LOSS_SCALE]
                     [--threshold-loss-scale THRESHOLD_LOSS_SCALE]
                     [--user-dir USER_DIR]
                     [--empty-cache-freq EMPTY_CACHE_FREQ]
                     [--all-gather-list-size ALL_GATHER_LIST_SIZE]
                     [--model-parallel-size MODEL_PARALLEL_SIZE]
                     [--checkpoint-suffix CHECKPOINT_SUFFIX]
                     [--checkpoint-shard-count CHECKPOINT_SHARD_COUNT]
                     [--quantization-config-path QUANTIZATION_CONFIG_PATH]
                     [--profile]
                     [--criterion {legacy_masked_lm_loss,label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,wav2vec,ctc,cross_entropy,sentence_ranking,composite_loss,adaptive_loss,sentence_prediction,masked_lm,nat_loss,vocab_parallel_cross_entropy}]
                     [--tokenizer {space,nltk,moses}]
                     [--bpe {byte_bpe,subword_nmt,hf_byte_bpe,sentencepiece,characters,bert,gpt2,fastbpe,bytes}]
                     [--optimizer {adadelta,sgd,lamb,nag,adafactor,adagrad,adam,adamax}]
                     [--lr-scheduler {tri_stage,polynomial_decay,triangular,reduce_lr_on_plateau,cosine,fixed,inverse_sqrt}]
                     [--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK]
                     [--num-workers NUM_WORKERS]
                     [--skip-invalid-size-inputs-valid-test]
                     [--max-tokens MAX_TOKENS] [--batch-size BATCH_SIZE]
                     [--required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE]
                     [--required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE]
                     [--dataset-impl {raw,lazy,cached,mmap,fasta}]
                     [--data-buffer-size DATA_BUFFER_SIZE]
                     [--train-subset TRAIN_SUBSET]
                     [--valid-subset VALID_SUBSET]
                     [--validate-interval VALIDATE_INTERVAL]
                     [--validate-interval-updates VALIDATE_INTERVAL_UPDATES]
                     [--validate-after-updates VALIDATE_AFTER_UPDATES]
                     [--fixed-validation-seed FIXED_VALIDATION_SEED]
                     [--disable-validation]
                     [--max-tokens-valid MAX_TOKENS_VALID]
                     [--batch-size-valid BATCH_SIZE_VALID]
                     [--curriculum CURRICULUM] [--gen-subset GEN_SUBSET]
                     [--num-shards NUM_SHARDS] [--shard-id SHARD_ID]
                     [--distributed-world-size DISTRIBUTED_WORLD_SIZE]
                     [--distributed-rank DISTRIBUTED_RANK]
                     [--distributed-backend DISTRIBUTED_BACKEND]
                     [--distributed-init-method DISTRIBUTED_INIT_METHOD]
                     [--distributed-port DISTRIBUTED_PORT]
                     [--device-id DEVICE_ID] [--distributed-no-spawn]
                     [--ddp-backend {c10d,no_c10d}]
                     [--bucket-cap-mb BUCKET_CAP_MB] [--fix-batches-to-gpus]
                     [--find-unused-parameters] [--fast-stat-sync]
                     [--broadcast-buffers]
                     [--distributed-wrapper {DDP,SlowMo}]
                     [--slowmo-momentum SLOWMO_MOMENTUM]
                     [--slowmo-algorithm SLOWMO_ALGORITHM]
                     [--localsgd-frequency LOCALSGD_FREQUENCY]
                     [--nprocs-per-node NPROCS_PER_NODE]
                     [--pipeline-model-parallel]
                     [--pipeline-balance PIPELINE_BALANCE]
                     [--pipeline-devices PIPELINE_DEVICES]
                     [--pipeline-chunks PIPELINE_CHUNKS]
                     [--pipeline-encoder-balance PIPELINE_ENCODER_BALANCE]
                     [--pipeline-encoder-devices PIPELINE_ENCODER_DEVICES]
                     [--pipeline-decoder-balance PIPELINE_DECODER_BALANCE]
                     [--pipeline-decoder-devices PIPELINE_DECODER_DEVICES]
                     [--pipeline-checkpoint {always,never,except_last}]
                     [--zero-sharding {none,os}] [--arch ARCH]
                     [--max-epoch MAX_EPOCH] [--max-update MAX_UPDATE]
                     [--stop-time-hours STOP_TIME_HOURS]
                     [--clip-norm CLIP_NORM] [--sentence-avg]
                     [--update-freq UPDATE_FREQ] [--lr LR] [--min-lr MIN_LR]
                     [--use-bmuf] [--save-dir SAVE_DIR]
                     [--restore-file RESTORE_FILE]
                     [--finetune-from-model FINETUNE_FROM_MODEL]
                     [--reset-dataloader] [--reset-lr-scheduler]
                     [--reset-meters] [--reset-optimizer]
                     [--optimizer-overrides OPTIMIZER_OVERRIDES]
                     [--save-interval SAVE_INTERVAL]
                     [--save-interval-updates SAVE_INTERVAL_UPDATES]
                     [--keep-interval-updates KEEP_INTERVAL_UPDATES]
                     [--keep-last-epochs KEEP_LAST_EPOCHS]
                     [--keep-best-checkpoints KEEP_BEST_CHECKPOINTS]
                     [--no-save] [--no-epoch-checkpoints]
                     [--no-last-checkpoints] [--no-save-optimizer-state]
                     [--best-checkpoint-metric BEST_CHECKPOINT_METRIC]
                     [--maximize-best-checkpoint-metric] [--patience PATIENCE]
                     [--encoder-layers L] [--encoder-embed-dim H]
                     [--encoder-ffn-embed-dim F] [--encoder-attention-heads A]
                     [--activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}]
                     [--pooler-activation-fn {relu,gelu,gelu_fast,gelu_accurate,tanh,linear}]
                     [--encoder-normalize-before] [--dropout D]
                     [--attention-dropout D] [--activation-dropout D]
                     [--pooler-dropout D] [--max-positions MAX_POSITIONS]
                     [--load-checkpoint-heads] [--encoder-layerdrop D]
                     [--encoder-layers-to-keep ENCODER_LAYERS_TO_KEEP]
                     [--quant-noise-pq D] [--quant-noise-pq-block-size D]
                     [--quant-noise-scalar D] [--untie-weights-roberta]
                     [--spectral-norm-classification-head]
                     [--adam-betas ADAM_BETAS] [--adam-eps ADAM_EPS]
                     [--weight-decay WEIGHT_DECAY] [--use-old-adam]
                     [--force-anneal N] [--warmup-updates N]
                     [--end-learning-rate END_LEARNING_RATE] [--power POWER]
                     [--total-num-update TOTAL_NUM_UPDATE]
                     [--sample-break-mode {none,complete,complete_doc,eos}]
                     [--tokens-per-sample TOKENS_PER_SAMPLE]
                     [--mask-prob MASK_PROB]
                     [--leave-unmasked-prob LEAVE_UNMASKED_PROB]
                     [--random-token-prob RANDOM_TOKEN_PROB]
                     [--freq-weighted-replacement] [--mask-whole-words]
                     [--shorten-method {none,truncate,random_crop}]
                     [--shorten-data-split-list SHORTEN_DATA_SPLIT_LIST]
                     data
    fairseq-train: error: unrecognized arguments: --mask-multiple-length 10 --mask-stdev 10

Code sample

Expected behavior

Environment

Additional context

  1. I am following the tutorial of ZeroSpeech 2021 baseline system. link
  2. The two parameters, mask-multiple-length and mask-stdev, are in the masked_lm.py. So, I think it should work.
alexeib commented 3 years ago

it works for me. are you sure you have the latest fairseq version installed? try updating to master and then running something like this from the checked out dir:

PYTHONPATH=. python fairseq_cli/train.py --fp16 $RESULT/quantized/fairseq-bin-data \
...
cpark-dev commented 3 years ago

I have another error.

$ PYTHONPATH=. \
> python fairseq_cli/train.py --fp16 /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data \
>     --task masked_lm --criterion masked_lm \
>     --save-dir /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50 \
>     --keep-last-epochs 1 \
>     --train-subset train \
>     --num-workers 1 \
>     --arch roberta_base \
>     --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \
>     --lr-scheduler polynomial_decay --lr 0.0005 --total-num-update 250000 --warmup-updates 10000 \
>     --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
>     --mask-multiple-length 10 --mask-prob 0.5 --mask-stdev 10 \
>     --sample-break-mode eos --tokens-per-sample 3072 --max-positions 6144 \
>     --max-tokens 4096 --update-freq 128 --max-update 250000 \
>     --seed 5 --log-format simple --log-interval 10 --skip-invalid-size-inputs-valid-test
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370131125/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
2021-02-12 21:49:30 | INFO | fairseq_cli.train | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 10, 'log_format': 'simple', 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 5, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': True, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'tpu': False, 'distributed_num_procs': 0}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': True, 'max_tokens': 4096, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 4096, 'batch_size_valid': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 250000, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [128], 'lr': [0.0005], 'stop_min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': '/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_last_epochs': 1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'model_parallel_size': 1, 'distributed_rank': 0}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': Namespace(_name='roberta_base', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9, 0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='roberta_base', attention_dropout=0.1, azureml_logging=False, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='masked_lm', curriculum=0, data='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, end_learning_rate=0.0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, freq_weighted_replacement=False, gen_subset='test', heartbeat_timeout=-1, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=1, leave_unmasked_prob=0.1, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_format='simple', log_interval=10, lr=[0.0005], lr_scheduler='polynomial_decay', mask_multiple_length=10, mask_prob=0.5, mask_stdev=10.0, mask_whole_words=False, max_epoch=0, max_positions=6144, max_tokens=4096, max_tokens_valid=4096, max_update=250000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=1, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, random_token_prob=0.1, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode='eos', save_dir='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50', save_interval=1, save_interval_updates=0, scoring='bleu', seed=5, sentence_avg=False, shard_id=0, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, spectral_norm_classification_head=False, stop_min_lr=-1.0, stop_time_hours=0, suppress_crashes=False, task='masked_lm', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=3072, total_num_update='250000', tpu=False, train_subset='train', unk=3, untie_weights_roberta=False, update_freq=[128], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_updates=10000, weight_decay=0.01, zero_sharding='none'), 'task': Namespace(_name='masked_lm', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9, 0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='roberta_base', attention_dropout=0.1, azureml_logging=False, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='masked_lm', curriculum=0, data='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, end_learning_rate=0.0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, freq_weighted_replacement=False, gen_subset='test', heartbeat_timeout=-1, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=1, leave_unmasked_prob=0.1, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_format='simple', log_interval=10, lr=[0.0005], lr_scheduler='polynomial_decay', mask_multiple_length=10, mask_prob=0.5, mask_stdev=10.0, mask_whole_words=False, max_epoch=0, max_positions=6144, max_tokens=4096, max_tokens_valid=4096, max_update=250000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=1, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, random_token_prob=0.1, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode='eos', save_dir='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50', save_interval=1, save_interval_updates=0, scoring='bleu', seed=5, sentence_avg=False, shard_id=0, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, spectral_norm_classification_head=False, stop_min_lr=-1.0, stop_time_hours=0, suppress_crashes=False, task='masked_lm', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=3072, total_num_update='250000', tpu=False, train_subset='train', unk=3, untie_weights_roberta=False, update_freq=[128], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_updates=10000, weight_decay=0.01, zero_sharding='none'), 'criterion': Namespace(_name='masked_lm', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9, 0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='roberta_base', attention_dropout=0.1, azureml_logging=False, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='masked_lm', curriculum=0, data='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, end_learning_rate=0.0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, freq_weighted_replacement=False, gen_subset='test', heartbeat_timeout=-1, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=1, leave_unmasked_prob=0.1, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_format='simple', log_interval=10, lr=[0.0005], lr_scheduler='polynomial_decay', mask_multiple_length=10, mask_prob=0.5, mask_stdev=10.0, mask_whole_words=False, max_epoch=0, max_positions=6144, max_tokens=4096, max_tokens_valid=4096, max_update=250000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=1, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, random_token_prob=0.1, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode='eos', save_dir='/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50', save_interval=1, save_interval_updates=0, scoring='bleu', seed=5, sentence_avg=False, shard_id=0, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=True, slowmo_algorithm='LocalSGD', slowmo_momentum=None, spectral_norm_classification_head=False, stop_min_lr=-1.0, stop_time_hours=0, suppress_crashes=False, task='masked_lm', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=3072, total_num_update='250000', tpu=False, train_subset='train', unk=3, untie_weights_roberta=False, update_freq=[128], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_updates=10000, weight_decay=0.01, zero_sharding='none'), 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9, 0.98)', 'adam_eps': 1e-06, 'weight_decay': 0.01, 'use_old_adam': False, 'tpu': False, 'lr': [0.0005]}, 'lr_scheduler': {'_name': 'polynomial_decay', 'warmup_updates': 10000, 'force_anneal': None, 'end_learning_rate': 0.0, 'power': 1.0, 'total_num_update': 250000.0, 'lr': [0.0005]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None}
2021-02-12 21:49:30 | INFO | fairseq.tasks.masked_lm | dictionary: 56 types
2021-02-12 21:49:30 | INFO | fairseq.data.data_utils | loaded 2,534 examples from: /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data/valid
Traceback (most recent call last):
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_dataset.py", line 46, in __init__
    from fairseq.data.token_block_utils_fast import (
ModuleNotFoundError: No module named 'fairseq.data.token_block_utils_fast'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "fairseq_cli/train.py", line 453, in <module>
    cli_main()
  File "fairseq_cli/train.py", line 449, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/distributed/utils.py", line 360, in call_main
    main(cfg, **kwargs)
  File "fairseq_cli/train.py", line 74, in main
    task.load_dataset(valid_sub_split, combine=False, epoch=1)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/tasks/masked_lm.py", line 161, in load_dataset
    dataset = TokenBlockDataset(
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_dataset.py", line 51, in __init__
    raise ImportError(
ImportError: Please build Cython components with: `pip install --editable .` or `python setup.py build_ext --inplace`
alexeib commented 3 years ago

did you try following the instruction on the last line?

cpark-dev commented 3 years ago

I tried the second, python setup.py build_ext --inplace, because I installed it via pip. However, it failed.

$ python setup.py build_ext --inplace
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370131125/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
No CUDA runtime is found, using CUDA_HOME='/share/mini1/sw/std/cuda/cuda9.2/x86_64'
running build_ext
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:294: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
cythoning fairseq/data/data_utils_fast.pyx to fairseq/data/data_utils_fast.cpp
cythoning fairseq/data/token_block_utils_fast.pyx to fairseq/data/token_block_utils_fast.cpp
building 'fairseq.libbleu' extension
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:266: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Emitting ninja build file /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/clib/libbleu/libbleu.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=libbleu -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
[2/2] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/clib/libbleu/module.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=libbleu -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/fairseq
g++ -pthread -shared -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -L/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,-rpath=/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,--no-as-needed -Wl,--sysroot=/ /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/libbleu.o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libbleu/module.o -o build/lib.linux-x86_64-3.8/fairseq/libbleu.cpython-38-x86_64-linux-gnu.so
building 'fairseq.data.data_utils_fast' extension
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:266: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Emitting ninja build file /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/data_utils_fast.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=data_utils_fast -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
In file included from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0,
                 from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/data_utils_fast.cpp:624:
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it with " \
  ^
creating build/lib.linux-x86_64-3.8/fairseq/data
g++ -pthread -shared -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -L/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,-rpath=/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,--no-as-needed -Wl,--sysroot=/ /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/data_utils_fast.o -o build/lib.linux-x86_64-3.8/fairseq/data/data_utils_fast.cpython-38-x86_64-linux-gnu.so
building 'fairseq.data.token_block_utils_fast' extension
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:266: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Emitting ninja build file /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=token_block_utils_fast -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
In file included from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0,
                 from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp:625:
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it with " \
  ^
/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp: In function ‘PyArrayObject* __pyx_f_7fairseq_4data_22token_block_utils_fast__get_slice_indices_fast(PyArrayObject*, PyObject*, int, int, int)’:
/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp:3319:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       __pyx_t_4 = ((__pyx_v_sz_idx < __pyx_t_10) != 0);
                                      ^
/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/data/token_block_utils_fast.cpp:3514:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       __pyx_t_3 = ((__pyx_v_sz_idx < __pyx_t_10) != 0);
                                      ^
g++ -pthread -shared -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -L/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,-rpath=/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib -Wl,--no-as-needed -Wl,--sysroot=/ /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.o -o build/lib.linux-x86_64-3.8/fairseq/data/token_block_utils_fast.cpython-38-x86_64-linux-gnu.so
building 'fairseq.libnat' extension
creating /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat
/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py:266: UserWarning:

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Emitting ninja build file /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/TH -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/THC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/clib/libnat/edit_dist.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=libnat -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o
c++ -MMD -MF /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o.d -pthread -B /share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/TH -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/include/THC -I/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/include/python3.8 -c -c /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/clib/libnat/edit_dist.cpp -o /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=libnat -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
c++: error: unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1533, in _run_ninja_build
    subprocess.run(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 257, in <module>
    do_setup(package_data)
  File "setup.py", line 168, in do_setup
    setup(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in build_extensions
    build_ext.build_extensions(self)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
    objects = self.compiler.compile(sources,
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 491, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1250, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
cpark-dev commented 3 years ago

I resolve the compiling error. link

Then, I got the error below.

$ PYTHONPATH=. \
> python train.py --fp16 /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/result/quantized/fairseq-bin-data \
>     --task masked_lm --criterion masked_lm \
>     --save-dir /share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/zerospeech2021_baseline/checkpoints/BERT_CPC_big_kmeans50 \
>     --keep-last-epochs 1 \
>     --train-subset train \
>     --arch roberta_base \
>     --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \
>     --lr-scheduler polynomial_decay --lr 0.0005 --total-num-update 250000 --warmup-updates 10000 \
>     --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
>     --mask-multiple-length 10 --mask-prob 0.5 --mask-stdev 10 \
>     --sample-break-mode eos --tokens-per-sample 3072 --max-positions 6144 \
>     --max-tokens 4096 --update-freq 32 --max-update 250000 \
>     --seed 5 --log-format simple --log-interval 10 --skip-invalid-size-inputs-valid-test
2021-02-15 23:01:17 | INFO | fairseq.distributed.utils | distributed init (rank 3): tcp://localhost:18899
2021-02-15 23:01:17 | INFO | fairseq.distributed.utils | distributed init (rank 2): tcp://localhost:18899
2021-02-15 23:01:17 | INFO | fairseq.distributed.utils | distributed init (rank 0): tcp://localhost:18899
2021-02-15 23:01:17 | INFO | fairseq.distributed.utils | distributed init (rank 1): tcp://localhost:18899
Traceback (most recent call last):
  File "train.py", line 14, in <module>
    cli_main()
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq_cli/train.py", line 449, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/distributed/utils.py", line 338, in call_main
    torch.multiprocessing.spawn(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/distributed/utils.py", line 319, in distributed_main
    cfg.distributed_training.distributed_rank = distributed_init(cfg)
  File "/share/mini1/res/t/repr/com/unsup-en/zsc2021-eval/fairseq/fairseq/distributed/utils.py", line 258, in distributed_init
    dist.init_process_group(
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    barrier()
  File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/zerospeech2021_baseline/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
    work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370131125/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, internal error, NCCL version 2.7.8
bhaveshachhada commented 1 year ago

did you try following the instruction on the last line?

I tried. the last line worked for me.