SockeyeError: Model(s) require 1 factors, but 2 given (through --input and --input-factors).

awslabs / sockeye

Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch

Apache License 2.0

1.21k stars 323 forks source link

I've trained a Sockeye model with Source-Factoring. After I tried to train my model with the sf-file

sockeye-translate \
    --input term_001_001.en \
    --input-factors term_001_001_sf.en \
    --output term.out.bpe \
    --model term_constraint_model \
    --dtype float16 \
    --beam-size 5 \
    --batch-size 64

I received as output

SockeyeError: Model(s) require 1 factors, but 2 given (through --input and --input-factors).

What did I wrong and why can't I apply --input-factors?

I trained the model with this command:

sockeye-train -d prepared \
-vs examples/translation/wmt17_en_de/x_valid.en \
-vt examples/translation/wmt17_en_de/x_valid.de \
--shared-vocab \
-o term_constraint_model \
--overwrite-output \
--min-num-epochs 50 \
--max-num-epochs 100 \
--batch-size 560 \
--transformer-attention-heads 8:8 \
--transformer-activation-type 'relu':'relu' \
--transformer-dropout-act 0.1:0.1 \
--transformer-dropout-attention 0.1:0.1 \
--transformer-dropout-prepost 0.1:0.1 \
--transformer-feed-forward-num-hidden 2048:2048 \
--transformer-model-size 512 --num-layers 2:2 \
--transformer-positional-embedding-type fixed \
--transformer-preprocess n:n \
--transformer-postprocess dr:dr \
--target-factors-num-embed 1 \
--source-factors-num-embed 1 \
--dtype float32 \
--max-seq-len 101:101 \
--num-words 32302:32302 \
--num-embed 512:512 \
--label-smoothing 0.1 \
--embed-dropout 0.0:0.0 \
--loss cross-entropy \
--keep-last-params 1 \
--cache-last-best-params 1 \
--validation-source-factors examples/translation/wmt17_en_de/y_valid.en &

This is how the Sockeye-Prepared looked like:

sockeye-prepare-data \
    -s examples/translation/wmt17_en_de/x_train.en \
    -t examples/translation/wmt17_en_de/x_train.de --shared-vocab \
    -sf examples/translation/wmt17_en_de/y_train.en \
    --source-factors-use-source-vocab true \
    --word-min-count 2 --pad-vocab-to-multiple-of 8 --max-seq-len 95 \
    --num-samples-per-shard 10000000 --output prepared --max-processes $(nproc)

The Args-YAML of prepared says:

bucket_scaling: false
bucket_width: 8
config: null
loglevel: INFO
loglevel_secondary_workers: INFO
max_processes: 40
max_seq_len:
- 95
- 95
min_num_shards: 1
no_bucketing: false
no_logfile: false
num_samples_per_shard: 10000000
num_words:
- 0
- 0
output: prepared
pad_vocab_to_multiple_of: 8
quiet: false
quiet_secondary_workers: false
seed: 13
shared_vocab: true
source: examples/translation/wmt17_en_de/x_train.en
source_factor_vocabs: []
source_factors:
- examples/translation/wmt17_en_de/y_train.en
source_factors_use_source_vocab:
- true
source_vocab: null
target: examples/translation/wmt17_en_de/x_train.de
target_factor_vocabs: []
target_factors: []
target_factors_use_target_vocab: []
target_vocab: null
word_min_count:
- 2
- 2

allow_missing_params: false amp: false apex_amp: false batch_sentences_multiple_of: 8 batch_size: 560 batch_type: word bucket_scaling: false bucket_width: 8 cache_last_best_params: 1 cache_metric: perplexity cache_strategy: best checkpoint_improvement_threshold: 0.0 checkpoint_interval: 4000 config: null decode_and_evaluate: 500 decode_and_evaluate_device_id: null decoder: transformer device_id: 0 device_ids: - -1 disable_device_locking: false dist: false dry_run: false dtype: float32 embed_dropout: - 0.0 - 0.0 encoder: transformer env: null fixed_param_names: [] fixed_param_strategy: null gradient_clipping_threshold: 1.0 gradient_clipping_type: none horovod: false ignore_extra_params: false initial_learning_rate: 0.0002 keep_initializations: false keep_last_params: 1 kvstore: device label_smoothing: 0.1 label_smoothing_impl: mxnet learning_rate_reduce_factor: 0.9 learning_rate_reduce_num_not_improved: 8 learning_rate_scheduler_type: plateau-reduce learning_rate_t_scale: 1.0 learning_rate_warmup: 0 length_task: null length_task_layers: 1 length_task_weight: 1.0 lhuc: null lock_dir: /tmp loglevel: INFO loglevel_secondary_workers: INFO loss: cross-entropy max_checkpoints: null max_num_checkpoint_not_improved: null max_num_epochs: 100 max_samples: null max_seconds: null max_seq_len: - 101 - 101 max_updates: null min_num_epochs: 50 min_samples: null min_updates: null momentum: 0.0 no_bucketing: false no_hybridization: false no_logfile: false num_embed: - 512 - 512 num_layers: - 2 - 2 num_words: - 32302 - 32302 optimized_metric: perplexity optimizer: adam optimizer_betas: - 0.9 - 0.999 optimizer_eps: 1.0e-08 optimizer_params: null output: term_constraint_model overwrite_output: true pad_vocab_to_multiple_of: 8 params: null prepared_data: baseline_sockeye quiet: false quiet_secondary_workers: false seed: 1 shared_vocab: true source: null source_factor_vocabs: [] source_factors: [] source_factors_combine: [] source_factors_num_embed: - 1 source_factors_share_embedding: [] source_factors_use_source_vocab: [] source_vocab: null stop_training_on_decoder_failure: false target: null target_factor_vocabs: [] target_factors: [] target_factors_combine: [] target_factors_num_embed: - 1 target_factors_share_embedding: [] target_factors_use_target_vocab: [] target_factors_weight: - 1.0 target_vocab: null transformer_activation_type: - relu - relu transformer_attention_heads: - 8 - 8 transformer_dropout_act: - 0.1 - 0.1 transformer_dropout_attention: - 0.1 - 0.1 transformer_dropout_prepost: - 0.1 - 0.1 transformer_feed_forward_num_hidden: - 2048 - 2048 transformer_feed_forward_use_glu: false transformer_model_size: - 512 - 512 transformer_positional_embedding_type: fixed transformer_postprocess: - dr - dr transformer_preprocess: - n - n update_interval: 1 use_cpu: false validation_source: examples/translation/wmt17_en_de/valid.en validation_source_factors: [] validation_target: examples/translation/wmt17_en_de/valid.de validation_target_factors: [] weight_decay: 0.0 weight_init: xavier weight_init_scale: 3.0 weight_init_xavier_factor_type: avg weight_init_xavier_rand_type: uniform weight_tying_type: src_trg_softmax word_min_count: - 1 - 1

awslabs / sockeye

SockeyeError: Model(s) require 1 factors, but 2 given (through --input and --input-factors). #1026