DFKI-NLP / sherlock

State-of-the-art Information Extraction
3 stars 1 forks source link

`get_additional_tokens` is in a questionable place. #41

Closed GabrielKP closed 2 years ago

GabrielKP commented 2 years ago

The function get_additional_tokens extracts additional tokens dependent on the task. It currently is implemented in DatasetReader, but DatasetReader should be task-agnostic. It would make more sense to implement get_additional_tokens in the initialization of the feature_converter, where each feature_converter is made for exactly one task.

GabrielKP commented 2 years ago

@leonhardhennig 3 Options:

  1. Not important enough: ignore;
  2. Not a valid point;
  3. Valid point, it should be changed.
leonhardhennig commented 2 years ago

you're right, and I'm beginning to think Christoph has redone all this in his newish repo (that he and Arne may publish soon) - there is a "Task" class which kind of replaces the feature_converter which does all this. (also the handling of special tokens as in issue #42).

I'd tend to option 1 currently since I think this is lower prio. I'd rather have the Rel Classification tested and running so that we can pre-label the Businesswire/PLASS corpus as soon as possible

GabrielKP commented 2 years ago

I agree, although not optimal, this is workable, I will leave it as it is.