bitextor / bicleaner

Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
GNU General Public License v3.0
150 stars 22 forks source link

added score_only function and made compatible with cache #20

Closed kirefu closed 5 years ago

kirefu commented 5 years ago

Updated file for Snakefile compatibility.

Best wishes,

Faheem

mbanon commented 5 years ago

Hey @kirefu , is it OK for you if I reject this pull request but I add the feature in Bicleaner 0.12, that will replace current version next week? https://github.com/bitextor/bicleaner/tree/bicleaner-0.12

kirefu commented 5 years ago

Sure. Let me know if you want me to submit another pull request after the current version is replaced, or you can integrate it yourself it that is easier.

mbanon commented 5 years ago

@kirefu I already added the --score_only flag to the 0.12 branch :) I also added a changelog (https://github.com/bitextor/bicleaner/blob/master/CHANGELOG.md) Please note that I did not add the "if only two columns, first is src and second is trg" thing, please use "--scol 1 --tcol 2" instead. Thanks!

mbanon commented 5 years ago

And now in the correct branch: https://github.com/bitextor/bicleaner/blob/bicleaner-0.12/CHANGELOG.md hehe

kirefu commented 5 years ago

Thanks, yes --scol and --tcol sounds like a better idea