This is a large PR that updates the use of the PubtatorAnnotation object to be a List instead of a Dict. This involved a couple steps:
Make pmid an attribute of PubtatorAnnotation.
Update all the utils to respect this new schema.
The main benefit of this is that datasets that break documents into abstracts or sentences will now be supported. Previously, because pmid was a key of a Dict, only one item of text per pmid was retained.
Other changes
:sparkles: parse_pubtator now supports n-ary relations where n>2.
Overview
This is a large PR that updates the use of the
PubtatorAnnotation
object to be aList
instead of aDict
. This involved a couple steps:pmid
an attribute ofPubtatorAnnotation
.The main benefit of this is that datasets that break documents into abstracts or sentences will now be supported. Previously, because
pmid
was a key of aDict
, only one item of text perpmid
was retained.Other changes
parse_pubtator
now supports n-ary relations where n>2.