apmoore1 / target-extraction

Target based extraction
https://apmoore1.github.io/target-extraction/
Apache License 2.0
4 stars 1 forks source link

To CONLL format #18

Open apmoore1 opened 4 years ago

apmoore1 commented 4 years ago

To convert the TargetTextCollection into a CONLL formatted file. This function can have two options:

  1. Just targets. Where the format will be BIO and no labels
  2. labels. Where the format will be BIO and include labels e.g. B-POS, I-POS, B-NEG, I-NEG, and O.

Furthermore from these two options it would be good to have the option to include predictions and any number of predictions e.g. if you ran the same type of model multiple times to take into account random seeds.

The format of the CONLL file will be the following:

TOKEN#GOLD LABEL#PREDICTION 1# PREDICTION 2

Where the number of predictions can go up to N.

The signature of the function will be the following:

to_conll(self, conll_fp: Path, gold_label_key: str, prediction_keys: Optional[List[str]] = None) -> None

By defining the gold_label_key this in affect allows the user to define whether or not it is targets, labels or any other sequence labelling task as this will be defined by the value within gold_label_key in each TargetText within the TargetTextCollection

apmoore1 commented 4 years ago

Would be good to finish this issue off with an example of how to use the CONLL formatting with the following points:

  1. Exporting
  2. Importing

Another separate notebook perhaps on how to:

  1. Train a model
  2. Export the predicted data
  3. Import the predicted data
  4. Evaluate the data.