The input file will have to contain a header which describes each column, and an output header will be printed. Changes:
The expected headers are: src_url, trg_url, src_text, trg_text and src_translated. The field trg_translated is optional.
The output header fields will be src_url, trg_url, src_text, trg_text and bleualign_score. If the flag --paragraph-identification is set, the fields src_paragraph_id and trg_paragraph_id will be added as well. If the flag --print-sent-hash is set, the fields src_deferred_hash and trg_deferred_hash will be added as well.
If the text data provided in the input file contains paragraph identification information in TSV format once base64-decoded, the flag --paragraph-identification should be set. Changes:
If the data provided in the fields src_text and trg_text contain paragraph identification information, it will be used to properly format the output fields.
If >=2 sentences are joined due to gap filler feature, the paragraph id. inf. will be joined as well.
These changes have been tested with basic configuration of the tool and, apparently, does not add any unexpected behaviour beyond the mentioned input and output headers or the properly format of the paragraph id. inf.
PS: this branch contains, unintentionally, the changes of the branch paragraph_identification.
The input file will have to contain a header which describes each column, and an output header will be printed. Changes:
--paragraph-identification
is set, the fields src_paragraph_id and trg_paragraph_id will be added as well. If the flag--print-sent-hash
is set, the fields src_deferred_hash and trg_deferred_hash will be added as well.If the text data provided in the input file contains paragraph identification information in TSV format once base64-decoded, the flag
--paragraph-identification
should be set. Changes:These changes have been tested with basic configuration of the tool and, apparently, does not add any unexpected behaviour beyond the mentioned input and output headers or the properly format of the paragraph id. inf.
PS: this branch contains, unintentionally, the changes of the branch paragraph_identification.