Helsinki-NLP / OpusFilter

OpusFilter - Parallel corpus processing toolkit
MIT License
101 stars 18 forks source link

use latest release if not provided #5

Closed jbrry closed 3 years ago

jbrry commented 3 years ago

I found it a bit difficult to find out exactly what the latest versions of corpora were on OPUS. OpusTools uses latest as the default version (https://github.com/Helsinki-NLP/OpusTools/blob/master/opustools_pkg/README.md) but this behaviour is not supported in OpusFilter and if you do not pass release: to the opus_read step then there is a KeyError.

This small change checks to see if a release: was provided and if not, it sets the key's value as 'latest'. This may be helpful for incremental release changes e.g. Paracrawl v7.1. This behaviour is applied in a test file below but is not supported by the main library yet: https://github.com/Helsinki-NLP/OpusFilter/blob/46b3663c049ece62cf319a75e9dd021c5198ccc0/tests/test_opus_filter.py#L24.

jbrry commented 3 years ago

Apologies for my oversight; one could easily just add the below to their config file. Feel free to ignore the PR if you see no need to change.

release: latest
svirpioj commented 3 years ago

Thanks for the suggestion; I think "latest" as a default release is useful. I added it with a slightly different code, so I'll close this PR.

jbrry commented 3 years ago

Yes this is a much cleaner way than what I had proposed, thanks for adding it!