TheScienceMuseum / heritage-connector

Heritage Connector: Transforming text into data to extract meaning and make connections
https://www.sciencemuseumgroup.org.uk/projects/heritage-connector
MIT License
21 stars 3 forks source link

NEL feature creation should not be metadata-dependent #294

Closed kdutia closed 3 years ago

kdutia commented 3 years ago

At the moment, when features are created for NEL, they rely on label and description columns already being present in the data. This is not ideal if you want to use the same pairs entity mentions and records that have been annotated as linking, but titles or descriptions are updated.

This should likely be in the form of a pipeline element that can produce the dataframe accepted by NELFeatureGenerator from the data entity_mention, source_record, target_record.

kdutia commented 3 years ago

added a notebook to be able to update descriptions from an old training data file. Solves this issue, although not neatly.