Add DGM - Githubissues

This PR adds a command, dgm that preprocesses the drug-gene-mutation corpus from Document-Level N-ary Relation Extraction with Multiscale Representation Learning.

We made a couple of simplifying decisions, outlined here:

The dataset is provided in sentence-length, paragraph-length, and full-text length documents. We use the paragraph-length text, but this could be updated.
There are no relation annotations on the paragraph-level, so we use the validation set as a test set, and hold 10% of the train set out for a new validation set.

This is documented by the dgm command and will come up if you call seq2rel-ds preprocess dgm main --help.

JohnGiorgi / seq2rel-ds