bitextor / bifixer

Tool to fix bitexts and tag near-duplicates for removal
GNU General Public License v3.0
29 stars 3 forks source link

Add headers to input and output files #11

Closed cgr71ii closed 2 years ago

cgr71ii commented 2 years ago

The provided input file to either Bifixer or Monofixer now might contain an optional header. Optional output header. Changes:

These changes have been tested with basic configuration of the tools and, apparently, does not add any unexpected behaviour beyond the mentioned input and output headers.

Edit: I have removed the option --output_header since I think it is not useful and because the output fields can be extracted from the input header if set. Now, if --header is set, the output header will be printed using the provided values in the header.

cgr71ii commented 2 years ago

Once PR #12 is accepted (if accepted), this PR will need to add changes in order to implement the logic to the new flags implemented in #12. When done, this PR will be ready for review

cgr71ii commented 2 years ago

I force-pushed becuase I made a commit which didn't upload half of a file, so the log history was lost. This happened due to running out of space, which was totally unexpected to me until I noticed which half of a file was missing while reviewing the diff in the commit.