NorskRegnesentral / skweak

skweak: A software toolkit for weak supervision applied to NLP tasks
MIT License
918 stars 73 forks source link

Aggregation on docs with empty source does not set any `doc.spans` key #25

Closed Zatteliet closed 3 years ago

Zatteliet commented 3 years ago

Currently, running an aggregator on a doc when the sources are empty lists does not set doc.spans[aggregator_name]. This is clear here: https://github.com/NorskRegnesentral/skweak/blob/main/skweak/aggregation.py#L54

This caused me some difficult-to-diagnose errors. Intuitively, I would expect the aggregator to set an empty list on the doc.spans, like the annotator functions do. If this is the intended behaviour, would strongly recommend mentioning this in the wiki.

As a sidenote, I think it is a bit confusing that the voter both modifies the given doc and returns it. I would either make it clear the operation happens in-place or return a modified copy of the doc, but not both. Either way the wiki doesn't make it very clear.