Helsinki-NLP / OpusFilter

OpusFilter - Parallel corpus processing toolkit
MIT License
102 stars 18 forks source link

Is it possible to generate score file during training alignment model? #34

Closed BrightXiaoHan closed 2 years ago

svirpioj commented 2 years ago

Not at the moment; the forward and backward scores files generated by eflomal are saved as temporary files are removed afterwards. But it should be easy to add an option to save them in a permanent file. But they would be in eflomal's original score format, not as OpusFilter score file. Which one are you looking for?

BrightXiaoHan commented 2 years ago

"OpusFilter score file" is what I want.

svirpioj commented 2 years ago

Implemented in https://github.com/Helsinki-NLP/OpusFilter/pull/36

BrightXiaoHan commented 2 years ago

Thanks

BrightXiaoHan commented 2 years ago

I try it by the following config

  - type: train_alignment
    parameters:
      src_data: zh.rules
      tgt_data: en.rules
      parameters:
        model: 3
        src_tokenizer: [jieba, zh]
        tgt_tokenizer: [moses, en]
        scores: align_score.jsonl
      output: align.priors

but scores file align_score.jsonl was not generated.

svirpioj commented 2 years ago

This confused me for a while before I noticed that it's a problem in the documentation: The scores file is a top-level option, not under inner parameters. Thanks for noticing! I fixed the README.

The issue also revealed that many OpusFilter methods do not warn about extra parameters... Something to be improved in the future.