Similar to TaskModules, DocumentMetrics require documents of a certain type as input. This PR adds the functionality to let DocumentMetricss signal what document type they need.
In detail:
create RequiresDocumentTypeMixin that defines the class variable DOCUMENT_TYPE and the property document_type (which returns DOCUMENT_TYPE per default). It also defines the method convert_dataset() that checks for several edge cases before calling dataset.to_document_type(self.document_type)
use RequiresDocumentTypeMixin for DocumentMetric and also for TaskModule
add the parameter document_type to DocumentStatistics that will be returned when calling DocumentStatistic.document_type (it overwrites DOCUMENT_TYPE)
adjust the logic of (Iterable)Dataset(Dict).to_document_type(): we now also allow converters that are registered for document types that are subclasses of the requested type (e.g. if we have a converter for DocWithEntitiesAndRelations, but just need DocWithEntities, we still use that converter)
Similar to
TaskModules
,DocumentMetric
s require documents of a certain type as input. This PR adds the functionality to letDocumentMetrics
s signal what document type they need.In detail:
RequiresDocumentTypeMixin
that defines the class variableDOCUMENT_TYPE
and the propertydocument_type
(which returnsDOCUMENT_TYPE
per default). It also defines the methodconvert_dataset()
that checks for several edge cases before callingdataset.to_document_type(self.document_type)
RequiresDocumentTypeMixin
forDocumentMetric
and also forTaskModule
document_type
toDocumentStatistic
s that will be returned when callingDocumentStatistic.document_type
(it overwritesDOCUMENT_TYPE
)(Iterable)Dataset(Dict).to_document_type()
: we now also allow converters that are registered for document types that are subclasses of the requested type (e.g. if we have a converter forDocWithEntitiesAndRelations
, but just needDocWithEntities
, we still use that converter)