awslabs / sockeye

Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
https://awslabs.github.io/sockeye/
Apache License 2.0
1.21k stars 323 forks source link

SockeyeError: Model(s) require 1 factors, but 2 given (through --input and --input-factors). #1026

Closed RamoramaInteractive closed 2 years ago

RamoramaInteractive commented 2 years ago

I've trained a Sockeye model with Source-Factoring. After I tried to train my model with the sf-file

sockeye-translate \
    --input term_001_001.en \
    --input-factors term_001_001_sf.en \
    --output term.out.bpe \
    --model term_constraint_model \
    --dtype float16 \
    --beam-size 5 \
    --batch-size 64

I received as output

SockeyeError: Model(s) require 1 factors, but 2 given (through --input and --input-factors).

What did I wrong and why can't I apply --input-factors?

I trained the model with this command:

sockeye-train -d prepared \
-vs examples/translation/wmt17_en_de/x_valid.en \
-vt examples/translation/wmt17_en_de/x_valid.de \
--shared-vocab \
-o term_constraint_model \
--overwrite-output \
--min-num-epochs 50 \
--max-num-epochs 100 \
--batch-size 560 \
--transformer-attention-heads 8:8 \
--transformer-activation-type 'relu':'relu' \
--transformer-dropout-act 0.1:0.1 \
--transformer-dropout-attention 0.1:0.1 \
--transformer-dropout-prepost 0.1:0.1 \
--transformer-feed-forward-num-hidden 2048:2048 \
--transformer-model-size 512 --num-layers 2:2 \
--transformer-positional-embedding-type fixed \
--transformer-preprocess n:n \
--transformer-postprocess dr:dr \
--target-factors-num-embed 1 \
--source-factors-num-embed 1 \
--dtype float32 \
--max-seq-len 101:101 \
--num-words 32302:32302 \
--num-embed 512:512 \
--label-smoothing 0.1 \
--embed-dropout 0.0:0.0 \
--loss cross-entropy \
--keep-last-params 1 \
--cache-last-best-params 1 \
--validation-source-factors examples/translation/wmt17_en_de/y_valid.en &

This is how the Sockeye-Prepared looked like:

sockeye-prepare-data \
    -s examples/translation/wmt17_en_de/x_train.en \
    -t examples/translation/wmt17_en_de/x_train.de --shared-vocab \
    -sf examples/translation/wmt17_en_de/y_train.en \
    --source-factors-use-source-vocab true \
    --word-min-count 2 --pad-vocab-to-multiple-of 8 --max-seq-len 95 \
    --num-samples-per-shard 10000000 --output prepared --max-processes $(nproc)  

The Args-YAML of prepared says:

bucket_scaling: false
bucket_width: 8
config: null
loglevel: INFO
loglevel_secondary_workers: INFO
max_processes: 40
max_seq_len:
- 95
- 95
min_num_shards: 1
no_bucketing: false
no_logfile: false
num_samples_per_shard: 10000000
num_words:
- 0
- 0
output: prepared
pad_vocab_to_multiple_of: 8
quiet: false
quiet_secondary_workers: false
seed: 13
shared_vocab: true
source: examples/translation/wmt17_en_de/x_train.en
source_factor_vocabs: []
source_factors:
- examples/translation/wmt17_en_de/y_train.en
source_factors_use_source_vocab:
- true
source_vocab: null
target: examples/translation/wmt17_en_de/x_train.de
target_factor_vocabs: []
target_factors: []
target_factors_use_target_vocab: []
target_vocab: null
word_min_count:
- 2
- 2
RamoramaInteractive commented 2 years ago

I looked into the YAML of the term_constraint_model folder. It's strange, that it applied the prepared data from the baseline-Model, which is not supposed to do source-factoring:

allow_missing_params: false
amp: false
apex_amp: false
batch_sentences_multiple_of: 8
batch_size: 560
batch_type: word
bucket_scaling: false
bucket_width: 8
cache_last_best_params: 1
cache_metric: perplexity
cache_strategy: best
checkpoint_improvement_threshold: 0.0
checkpoint_interval: 4000
config: null
decode_and_evaluate: 500
decode_and_evaluate_device_id: null
decoder: transformer
device_id: 0
device_ids:
- -1
disable_device_locking: false
dist: false
dry_run: false
dtype: float32
embed_dropout:
- 0.0
- 0.0
encoder: transformer
env: null
fixed_param_names: []
fixed_param_strategy: null
gradient_clipping_threshold: 1.0
gradient_clipping_type: none
horovod: false
ignore_extra_params: false
initial_learning_rate: 0.0002
keep_initializations: false
keep_last_params: 1
kvstore: device
label_smoothing: 0.1
label_smoothing_impl: mxnet
learning_rate_reduce_factor: 0.9
learning_rate_reduce_num_not_improved: 8
learning_rate_scheduler_type: plateau-reduce
learning_rate_t_scale: 1.0
learning_rate_warmup: 0
length_task: null
length_task_layers: 1
length_task_weight: 1.0
lhuc: null
lock_dir: /tmp
loglevel: INFO
loglevel_secondary_workers: INFO
loss: cross-entropy
max_checkpoints: null
max_num_checkpoint_not_improved: null
max_num_epochs: 100
max_samples: null
max_seconds: null
max_seq_len:
- 101
- 101
max_updates: null
min_num_epochs: 50
min_samples: null
min_updates: null
momentum: 0.0
no_bucketing: false
no_hybridization: false
no_logfile: false
num_embed:
- 512
- 512
num_layers:
- 2
- 2
num_words:
- 32302
- 32302
optimized_metric: perplexity
optimizer: adam
optimizer_betas:
- 0.9
- 0.999
optimizer_eps: 1.0e-08
optimizer_params: null
output: term_constraint_model
overwrite_output: true
pad_vocab_to_multiple_of: 8
params: null
prepared_data: baseline_sockeye
quiet: false
quiet_secondary_workers: false
seed: 1
shared_vocab: true
source: null
source_factor_vocabs: []
source_factors: []
source_factors_combine: []
source_factors_num_embed:
- 1
source_factors_share_embedding: []
source_factors_use_source_vocab: []
source_vocab: null
stop_training_on_decoder_failure: false
target: null
target_factor_vocabs: []
target_factors: []
target_factors_combine: []
target_factors_num_embed:
- 1
target_factors_share_embedding: []
target_factors_use_target_vocab: []
target_factors_weight:
- 1.0
target_vocab: null
transformer_activation_type:
- relu
- relu
transformer_attention_heads:
- 8
- 8
transformer_dropout_act:
- 0.1
- 0.1
transformer_dropout_attention:
- 0.1
- 0.1
transformer_dropout_prepost:
- 0.1
- 0.1
transformer_feed_forward_num_hidden:
- 2048
- 2048
transformer_feed_forward_use_glu: false
transformer_model_size:
- 512
- 512
transformer_positional_embedding_type: fixed
transformer_postprocess:
- dr
- dr
transformer_preprocess:
- n
- n
update_interval: 1
use_cpu: false
validation_source: examples/translation/wmt17_en_de/valid.en
validation_source_factors: []
validation_target: examples/translation/wmt17_en_de/valid.de
validation_target_factors: []
weight_decay: 0.0
weight_init: xavier
weight_init_scale: 3.0
weight_init_xavier_factor_type: avg
weight_init_xavier_rand_type: uniform
weight_tying_type: src_trg_softmax
word_min_count:
- 1
- 1

I've trained the baseline and the source-factoring model at the same time.