This PR changes the preprocessing of DocRED so that it is first converted to PubTator. This lets us take advantage of a lot of existing code. It also uses a new split of DocRED slightly different than the original, that comes from this paper. The split allows us to compare to the paper without having to run the model through CodaLab.
Other changes
:label: Update kwargs everywhere to have consistent type hints.
:label: Fix all type hints under the preprocess module.
:bug: Fix bug where DocRED coreferent mentions weren't sorted by order of first appearance by default.
Overview
This PR changes the preprocessing of DocRED so that it is first converted to PubTator. This lets us take advantage of a lot of existing code. It also uses a new split of DocRED slightly different than the original, that comes from this paper. The split allows us to compare to the paper without having to run the model through CodaLab.
Other changes
kwargs
everywhere to have consistent type hints.preprocess
module.