JohnGiorgi / seq2rel-ds

This is a companion repository to seq2rel (https://github.com/JohnGiorgi/seq2rel) which aims to make it easy to generate training data.
5 stars 1 forks source link

Make pubtator annotations a list #39

Closed JohnGiorgi closed 3 years ago

JohnGiorgi commented 3 years ago

Overview

This is a large PR that updates the use of the PubtatorAnnotation object to be a List instead of a Dict. This involved a couple steps:

  1. Make pmid an attribute of PubtatorAnnotation.
  2. Update all the utils to respect this new schema.

The main benefit of this is that datasets that break documents into abstracts or sentences will now be supported. Previously, because pmid was a key of a Dict, only one item of text per pmid was retained.

Other changes