The main purpose of this PR is to download BC5CDR as part of the preprocessing step, rather than require the user to provide a local copy. This is a better user experience, but it also simplifies something I am working on now (computing stats on these corpora).
Other changes
:recycle: Moves the logic for converting Dict[str, PubTatorAnnotation] to a format that can be used by seq2rel to its own function.
:label: Fixes a ton of type hints. Still not ready to turn mypy back on (#3), but it is getting there.
Overview
The main purpose of this PR is to download BC5CDR as part of the preprocessing step, rather than require the user to provide a local copy. This is a better user experience, but it also simplifies something I am working on now (computing stats on these corpora).
Other changes
Dict[str, PubTatorAnnotation]
to a format that can be used by seq2rel to its own function.