Closed biro-mark closed 1 year ago
Hello,
As far as I know, this is not a subject that academia has worked on.
BaaL should be able to handle it, as we do not impose any format.
I will post an example tomorrow.
If you have any question, we would be happy to help you!
Hey @biro-mark, there is nothing in the literature but this is very similar to the case of the Multilabelling task and mainly the way that we handle it in BaaL is similar to our segmentation example. I didn't have a great improvement on this but it is because the models are already very good in NLP and it is hard to have a noticeable improvement. In any case, once @Dref360 posts the example you can try the same way and see if you are having better ways to deal with this. I am very interested to know the result. :)
This is a quick example on how to use our heuristics (BALD, Entropy) with NER and in segmentation. I hope this helps you, if I missed anything, please tell me.
from baal.active.heuristics import BALD
import torch
NUM_TOKENS=128
NUM_ITERATIONS=20
DATASET_LEN = 1000
NUM_CLASSES = 10
# The result of your MC iterations (see [1] below)
mc_sampling = torch.randn([DATASET_LEN, NUM_CLASSES, NUM_TOKENS, NUM_ITERATIONS])
bald = BALD()
uncertainty = bald.get_uncertainties(mc_sampling)
# Uncertainty has shape [DATASET_LEN, NUM_TOKENS] (see [2] below)
uncertainty.shape
[1] We propose ModelWrapper
to do this for you, or you can provide the value yourselves. More info here.
[2] This gives use the uncertainty per token, if we want to know the overall uncertainty of this sentence, we could use reduction
like this:
bald = BALD(reduction='mean')
uncertainty = bald.get_uncertainties(mc_sampling)
# Uncertainty has shape [DATASET_LEN]
uncertainty.shape
You can also supply your own reduction function.
Hello,
Just to mention that there is work on active learning for sequence labelling tasks such as NER (not many, though). For example, here is a recent paper: Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates. It would be very nice if BaaL could provide direct support for this type of tasks.
Hello again!
I'm trying to adapt the example in nlp_bert_mcdropout.py for the NER task. If I understand correctly, the changes to be made would be:
hyperparams["reduction"]
(if we want to be flexible to have that as an argument) or we could directly add reduction="mean"
)Would that be enough? I'm just familiarising with BaaL and I think it's great, so any pointer on how to correctly use it for a sequence labelling task would be welcome.
Reopening for visibility
Yes I think that would be it. @parmidaatg worked on multilabel in the past and might give more insight.
@feralvam yes that should be enough to have a running AL loop. You might wanna change the metrics respectively as well but that is all. It would be great if you would like to submit a PR for your example script in BaaL. we are trying to expand our support and that would help the community a lot :) let us know how your experiment goes.
I'd be happy to submit an example for NER after I manage to make it work.
There seems to be other parts of the code that need to be changed. For instance, the HuggingFaceDatasets
class.
Its _tokenize
function needs to be adapted in a similar way to here to handle the case where the texts are provided already tokenized, and the labels for each token need to be aligned accordingly.
In addition, I believe __getitem___
also needs to change when returning label
, since it's assuming a single label per instance (unless I am mistaken).
Perhaps taking a quick look to the conll2003 dataset in the Datasets library could help to get a better idea of other changes that could be needed that I haven't found yet.
yes, the example that I made with the HF wrapper only supports classification. Normally you shouldn't be needing that wrapper anyways if you handle your dataset yourself, you should be able to just use ActiveLearningDataset
wrapper directly. The HF wrapper is only a means for people who are not that familiar with NLP to be able to run a quick experiment. I'd be happy to see a PR from you if you wanna adopt HF Wrapper, otherwise, we will work on it eventually.
Perhaps we can work together for your example and upgrade baal to any necessary changes for supporting NER. Out of the box, I'd say it should work just using ActiveLearningDataset
and our Trainer wrapper
which changes the predict in HF trainer. But since I havent run NER, I'd be on this subject and if you submit any bug, I can try to fix it asap. What do you think?
Thanks! Really appreciate all your quick replies. I just started to work on this yesterday, so hopefully I should be submitting something soon for you to take a look.
So, here's a first attempt trying to merge the NER example from huggingface to the example in BaaL for sequence classification
import argparse
import numpy as np
import random
from copy import deepcopy
import torch
import torch.backends
from tqdm import tqdm
# These packages are optional and not needed for BaaL main package.
# You can have access to `datasets` and `transformers` if you install
# BaaL with --dev setup.
from datasets import load_dataset, load_metric
from transformers import AutoConfig, AutoTokenizer, TrainingArguments
from transformers import AutoModelForTokenClassification, DataCollatorForTokenClassification
from baal.active import get_heuristic
from baal.active import ActiveLearningDataset
from baal.active.active_loop import ActiveLearningLoop
from baal.bayesian.dropout import patch_module
from baal.transformers_trainer_wrapper import BaalTransformersTrainer
"""
Minimal example to use BaaL for NLP Token Classification (as for Named Entity Recognition).
"""
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--epoch", default=100, type=int)
parser.add_argument("--batch_size", default=32, type=int)
parser.add_argument("--initial_pool", default=1000, type=int)
parser.add_argument("--model", default="bert-base-uncased", type=str)
parser.add_argument("--n_data_to_label", default=100, type=int)
parser.add_argument("--heuristic", default="bald", type=str)
parser.add_argument("--iterations", default=20, type=int)
parser.add_argument("--shuffle_prop", default=0.05, type=float)
parser.add_argument("--reduction", default="mean", type=str)
parser.add_argument("--learning_epoch", default=20, type=int)
return parser.parse_args()
def get_datasets(initial_pool, tokenizer):
raw_datasets = load_dataset("conll2003")
features = raw_datasets["train"].features
# In the conll2003 dataset, the labels are a `Sequence[ClassLabel]`
label_list = features["ner_tags"].feature.names
# No need to convert the labels since they are already ints.
label_to_id = {i: i for i in range(len(label_list))}
# Map that sends B-Xxx label to its I-Xxx counterpart
b_to_i_label = []
for idx, label in enumerate(label_list):
if label.startswith("B-") and label.replace("B-", "I-") in label_list:
b_to_i_label.append(label_list.index(label.replace("B-", "I-")))
else:
b_to_i_label.append(idx)
# Tokenize all texts and align the labels with them.
def tokenize_and_align_labels(examples):
tokenized_inputs = tokenizer(
examples["tokens"],
padding="max_length",
truncation=True,
max_length=128,
# We use this argument because the texts in our dataset are lists of words (with a label for each word).
is_split_into_words=True,
)
labels = []
for i, label in enumerate(examples["ner_tags"]):
word_ids = tokenized_inputs.word_ids(batch_index=i)
previous_word_idx = None
label_ids = []
for word_idx in word_ids:
# Special tokens have a word id that is None. We set the label to -100 so they are automatically
# ignored in the loss function.
if word_idx is None:
label_ids.append(-100)
# We set the label for the first token of each word.
elif word_idx != previous_word_idx:
label_ids.append(label_to_id[label[word_idx]])
# For the other tokens in a word, we set the label to -100
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids)
tokenized_inputs["labels"] = labels
return tokenized_inputs
# Active Training set
train_dataset = raw_datasets["train"].map(
tokenize_and_align_labels,
batched=True,
num_proc=4,
load_from_cache_file=True,
desc="Running tokenizer on train dataset",
)
active_set = ActiveLearningDataset(train_dataset)
valid_set = raw_datasets["validation"].map(
tokenize_and_align_labels,
batched=True,
num_proc=4,
load_from_cache_file=True,
desc="Running tokenizer on validation dataset",
)
# We start labeling randomly.
active_set.label_randomly(initial_pool)
return active_set, valid_set, label_list, label_to_id
def main():
args = parse_args()
use_cuda = torch.cuda.is_available()
torch.backends.cudnn.benchmark = True
random.seed(1337)
torch.manual_seed(1337)
if not use_cuda:
print("warning, the experiments would take ages to run on cpu")
hyperparams = vars(args)
heuristic = get_heuristic(name=hyperparams["heuristic"], shuffle_prop=hyperparams["shuffle_prop"], reduction=hyperparams["reduction"])
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=hyperparams["model"], use_fast=True)
active_set, test_set, label_list, label_to_id = get_datasets(hyperparams["initial_pool"], tokenizer)
config = AutoConfig.from_pretrained(
hyperparams["model"],
num_labels=len(label_list),
label2id=label_to_id,
id2label={i: l for l, i in label_to_id.items()},
finetuning_task="ner",
)
model = AutoModelForTokenClassification.from_pretrained(pretrained_model_name_or_path=hyperparams["model"], config=config)
# change dropout layer to MCDropout
model = patch_module(model)
if use_cuda:
model.cuda()
init_weights = deepcopy(model.state_dict())
training_args = TrainingArguments(
output_dir="./results", # output directory
num_train_epochs=hyperparams["learning_epoch"], # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=8, # batch size for evaluation
weight_decay=0.01, # strength of weight decay
logging_dir="./logs", # directory for storing logs
)
# Data collator
data_collator = DataCollatorForTokenClassification(tokenizer, pad_to_multiple_of=None)
# Metrics
metric = load_metric("seqeval")
def compute_metrics(p):
predictions, labels = p
predictions = np.argmax(predictions, axis=2)
# Remove ignored index (special tokens)
true_predictions = [[label_list[p] for (p, l) in zip(prediction, label) if l != -100] for prediction, label in zip(predictions, labels)]
true_labels = [[label_list[l] for (p, l) in zip(prediction, label) if l != -100] for prediction, label in zip(predictions, labels)]
results = metric.compute(predictions=true_predictions, references=true_labels)
return {
"precision": results["overall_precision"],
"recall": results["overall_recall"],
"f1": results["overall_f1"],
"accuracy": results["overall_accuracy"],
}
# We wrap the huggingface Trainer to create an Active Learning Trainer
model = BaalTransformersTrainer(
model=model,
args=training_args,
train_dataset=active_set,
eval_dataset=test_set,
tokenizer=None,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
logs = {}
logs["epoch"] = 0
# In this case, nlp data is fast to process and we do NoT need to use a smaller batch_size
active_loop = ActiveLearningLoop(
active_set,
model.predict_on_dataset,
heuristic,
hyperparams.get("n_data_to_label", 1),
iterations=hyperparams["iterations"],
)
for epoch in tqdm(range(args.epoch)):
# we use the default setup of HuggingFace for training (ex: epoch=1).
# The setup is adjustable when BaalHuggingFaceTrainer is defined.
model.train()
# Validation!
eval_metrics = model.evaluate()
# We reorder the unlabelled pool at the frequency of learning_epoch
# This helps with speed while not changing the quality of uncertainty estimation.
should_continue = active_loop.step()
# We reset the model weights to relearn from the new trainset.
model.load_state_dict(init_weights)
model.lr_scheduler = None
if not should_continue:
break
active_logs = {
"epoch": epoch,
"labeled_data": active_set._labelled,
"Next Training set size": len(active_set),
}
logs = {**eval_metrics, **active_logs}
print(logs)
if __name__ == "__main__":
main()
I'm getting an error from the data_collator (had to use one for Token Classification). I still need to debug it to find what the real issue is. But I thought that getting some feedback about the general structure of the example could help. I can submit it as a WIP PR if that helps also.
I found the source of the error. It was due to what columns of the dataset effectively get to the collator. The problem was in this line of DataCollatorForTokenClassification
:
batch = {k: torch.tensor(v, dtype=torch.int64) for k, v in batch.items()}
Apparently, the collator expects that all columns in the batch had been padded properly. Only columns generated by the tokenizer have that characteristic. The rest, therefore, should be removed before the data is sent to the collator. HuggingFace's Trainer
removes these unused columns internally. However it has a line that checks that the dataset is an instance of datasets.Dataset
:
if is_datasets_available() and isinstance(train_dataset, datasets.Dataset):
train_dataset = self._remove_unused_columns(train_dataset, description="training")
Our training dataset is actually an instance of ActiveLearningDataset
that does not inherit from datasets.Dataset
. As such, that condition gets ignored and the columns are not removed.
The easy solution for the example was to remove the unused columns "manually". In this case, this was basically all of them since the tokenizer is the one that creates the ones that the model actually needs.
features = raw_datasets["train"].features
# Active Training Set
train_dataset = raw_datasets["train"].map(
tokenize_and_align_labels,
batched=True,
num_proc=hyperparams["preprocessing_num_workers"],
load_from_cache_file=True,
remove_columns=list(features.keys()),
desc="Running tokenizer on train dataset",
)
active_set = ActiveLearningDataset(train_dataset)
Here's a new version of the code:
import argparse
import numpy as np
import random
from copy import deepcopy
import torch
import torch.backends
from tqdm import tqdm
# These packages are optional and not needed for BaaL main package.
# You can have access to `datasets` and `transformers` if you install
# BaaL with --dev setup.
from datasets import load_dataset, load_metric
from transformers import AutoConfig, AutoTokenizer, TrainingArguments
from transformers import AutoModelForTokenClassification, DataCollatorForTokenClassification
from baal.active import get_heuristic
from baal.active import ActiveLearningDataset
from baal.active.active_loop import ActiveLearningLoop
from baal.bayesian.dropout import patch_module
from baal.transformers_trainer_wrapper import BaalTransformersTrainer
"""
Minimal example to use BaaL for NLP Token Classification (as for Named Entity Recognition).
"""
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--epoch", default=100, type=int)
parser.add_argument("--batch_size", default=32, type=int)
parser.add_argument("--initial_pool", default=1000, type=int)
parser.add_argument("--model", default="bert-base-uncased", type=str)
parser.add_argument("--n_data_to_label", default=100, type=int)
parser.add_argument("--heuristic", default="bald", type=str)
parser.add_argument("--iterations", default=20, type=int)
parser.add_argument("--shuffle_prop", default=0.05, type=float)
parser.add_argument("--reduction", default="mean", type=str)
parser.add_argument("--learning_epoch", default=20, type=int)
parser.add_argument("--preprocessing_num_workers", default=None, type=int)
parser.add_argument("--pad_to_max_length", default=False, type=bool)
parser.add_argument("--max_seq_length", default=None, type=int)
return parser.parse_args()
def get_datasets(hyperparams, tokenizer):
raw_datasets = load_dataset("conll2003")
features = raw_datasets["train"].features
# In the conll2003 dataset, the labels are a `Sequence[ClassLabel]`
label_list = features["ner_tags"].feature.names
# No need to convert the labels since they are already ints.
label_to_id = {i: i for i in range(len(label_list))}
# Map that sends B-Xxx label to its I-Xxx counterpart
b_to_i_label = []
for idx, label in enumerate(label_list):
if label.startswith("B-") and label.replace("B-", "I-") in label_list:
b_to_i_label.append(label_list.index(label.replace("B-", "I-")))
else:
b_to_i_label.append(idx)
# Preprocessing the dataset
# Padding strategy
padding = "max_length" if hyperparams["pad_to_max_length"] else False
# Tokenize all texts and align the labels with them.
def tokenize_and_align_labels(examples):
tokenized_inputs = tokenizer(
examples["tokens"],
padding=padding,
truncation=True,
max_length=hyperparams["max_seq_length"],
add_special_tokens=True,
return_token_type_ids=False,
# We use this argument because the texts in our dataset are lists of words (with a label for each word).
is_split_into_words=True,
)
labels = []
for i, label in enumerate(examples["ner_tags"]):
word_ids = tokenized_inputs.word_ids(batch_index=i)
previous_word_idx = None
label_ids = []
for word_idx in word_ids:
# Special tokens have a word id that is None. We set the label to -100 so they are automatically
# ignored in the loss function.
if word_idx is None:
label_ids.append(-100)
# We set the label for the first token of each word.
elif word_idx != previous_word_idx:
label_ids.append(label_to_id[label[word_idx]])
# For the other tokens in a word, we set the label to -100
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids)
tokenized_inputs["labels"] = labels
return tokenized_inputs
# Active Training Set
train_dataset = raw_datasets["train"].map(
tokenize_and_align_labels,
batched=True,
num_proc=hyperparams["preprocessing_num_workers"],
load_from_cache_file=True,
remove_columns=list(features.keys()),
desc="Running tokenizer on train dataset",
)
active_set = ActiveLearningDataset(train_dataset)
# Validation Set
valid_set = raw_datasets["validation"].map(
tokenize_and_align_labels,
batched=True,
num_proc=hyperparams["preprocessing_num_workers"],
load_from_cache_file=True,
remove_columns=list(features.keys()),
desc="Running tokenizer on validation dataset",
)
# We start labeling randomly.
active_set.label_randomly(hyperparams["initial_pool"])
return active_set, valid_set, label_list, label_to_id
def main():
args = parse_args()
use_cuda = torch.cuda.is_available()
torch.backends.cudnn.benchmark = True
random.seed(1337)
torch.manual_seed(1337)
if not use_cuda:
print("warning, the experiments would take ages to run on cpu")
hyperparams = vars(args)
heuristic = get_heuristic(name=hyperparams["heuristic"], shuffle_prop=hyperparams["shuffle_prop"], reduction=hyperparams["reduction"])
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=hyperparams["model"], use_fast=True)
active_set, test_set, label_list, label_to_id = get_datasets(hyperparams, tokenizer)
config = AutoConfig.from_pretrained(
hyperparams["model"],
num_labels=len(label_list),
label2id=label_to_id,
id2label={i: l for l, i in label_to_id.items()},
finetuning_task="ner",
)
model = AutoModelForTokenClassification.from_pretrained(pretrained_model_name_or_path=hyperparams["model"], config=config)
# change dropout layer to MCDropout
model = patch_module(model)
if use_cuda:
model.cuda()
init_weights = deepcopy(model.state_dict())
training_args = TrainingArguments(
output_dir="./results", # output directory
num_train_epochs=hyperparams["learning_epoch"], # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=8, # batch size for evaluation
weight_decay=0.01, # strength of weight decay
logging_dir="./logs", # directory for storing logs
)
# Data collator
data_collator = DataCollatorForTokenClassification(tokenizer, pad_to_multiple_of=None)
# Metrics
metric = load_metric("seqeval")
def compute_metrics(p):
predictions, labels = p
predictions = np.argmax(predictions, axis=2)
# Remove ignored index (special tokens)
true_predictions = [[label_list[p] for (p, l) in zip(prediction, label) if l != -100] for prediction, label in zip(predictions, labels)]
true_labels = [[label_list[l] for (p, l) in zip(prediction, label) if l != -100] for prediction, label in zip(predictions, labels)]
results = metric.compute(predictions=true_predictions, references=true_labels)
return {
"precision": results["overall_precision"],
"recall": results["overall_recall"],
"f1": results["overall_f1"],
"accuracy": results["overall_accuracy"],
}
# We wrap the huggingface Trainer to create an Active Learning Trainer
model = BaalTransformersTrainer(
model=model,
args=training_args,
train_dataset=active_set,
eval_dataset=test_set,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
logs = {}
logs["epoch"] = 0
# In this case, nlp data is fast to process and we do NoT need to use a smaller batch_size
active_loop = ActiveLearningLoop(
active_set,
model.predict_on_dataset,
heuristic,
hyperparams.get("n_data_to_label", 1),
iterations=hyperparams["iterations"],
)
for epoch in tqdm(range(args.epoch)):
# we use the default setup of HuggingFace for training (ex: epoch=1).
# The setup is adjustable when BaalHuggingFaceTrainer is defined.
model.train()
# Validation!
eval_metrics = model.evaluate()
# We reorder the unlabelled pool at the frequency of learning_epoch
# This helps with speed while not changing the quality of uncertainty estimation.
should_continue = active_loop.step()
# We reset the model weights to relearn from the new trainset.
model.load_state_dict(init_weights)
model.lr_scheduler = None
if not should_continue:
break
active_logs = {
"epoch": epoch,
"labeled_data": active_set._labelled,
"Next Training set size": len(active_set),
}
logs = {**eval_metrics, **active_logs}
print(logs)
if __name__ == "__main__":
main()
However, now I get another error when executing should_continue = active_loop.step()
:
Traceback (most recent call last):
File "nlp_ner_bert_mcdropout.py", line 245, in <module>
main()
File "nlp_ner_bert_mcdropout.py", line 227, in main
should_continue = active_loop.step()
File "/experiments/falva/tools/baal/baal/active/active_loop.py", line 72, in step
probs = self.get_probabilities(pool, **self.kwargs)
File "/experiments/falva/tools/baal/baal/transformers_trainer_wrapper.py", line 119, in predict_on_dataset
return np.vstack(preds)
File "<__array_function__ internals>", line 6, in vstack
File "/home/falva/miniconda3/envs/baal/lib/python3.7/site-packages/numpy/core/shape_base.py", line 282, in vstack
return _nx.concatenate(arrs, 0)
File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 49 and the array at index 1 has size 51
I'll keep investigating and let you know what I discover. As always, any pointers are welcome.
So, I don't get the error if I manually set the max_seq_length
(e.g. to 128). According to the HuggingFace documentation, if you don't specify this for the tokenizer, then each batch will have the length of the max sequence of that batch. However, I think BaaL is assuming that all batches have the same max_seq_length
?
While making that change avoids getting that exception, there's something I wanted to ask. In lines 117-118 of transformers_trainer_wrapper.py
we have:
if len(preds) > 0 and not isinstance(preds[0], Sequence):
# Is an Array or a Tensor
return np.vstack(preds)
According to the documentation, the returned value should be: [n_samples, n_outputs, ..., n_iterations].
Or as @Dref360 mentioned before: [DATASET_LEN, NUM_CLASSES, NUM_TOKENS, NUM_ITERATIONS]
. However, it actually is [n_samples, max_seq_length, n_classes, n_iterations]
. Would the fact that dimensions 1 and 2 are in a different order have an effect on the rest of the active learning process? For instance, in how the scores/uncertanties are computed or aggregated? Thanks!
Hey @feralvam,
I think it is great if you submit a WIP PR.
To clarify a bit (since I had a similar problem recently), for now, I'd suggest making sure that after tokenization all the samples have the same length (meaning using padding and max_length). You are correct about what happens if you do not specify the max_length
but also this creates a bit of inconsistency in uncertainty calculation which we prefer to avoid. Let alone that my experiments in a similar context were not producing reliable results. So this is something that I don't think we should encourage specifically given the context of uncertainty. I am absolutely open to your thought on that if you have counterexamples.
for your question, we provide reduction
argument in the heuristics
to get rid of the extra dimension(s) given different tasks. Generally speaking, for any heuristic, we would like to have the outcome probability distribution per sample per iteration and any other dimension is not used in the heuristic calculation. Now depending on the task, you can decide which type of reduction might be beneficial to give you the final shape for using the heuristics. For example, in segmentation, we take the average over the pixels. I'd suggest start from mean
and then try other reduction types to see which one works best.
This is where reductions are defined, and it can be set when you define your heuristic:
https://github.com/ElementAI/baal/blob/master/baal/active/heuristics/heuristics.py#L15
@feralvam did you have any result on this?
Hi! I managed to make the code work, but I didn't run any full/extensive experiments to verify if I could reproduce results of previous work on NER using the code. I'm working on other sequence labelling task atm, so I'll get back to this in a few weeks most likely. Thanks for all your help so far!
Hey, what you did in the end regarding lines 117-118 of transformers_trainer_wrapper.py
?
Hi @shaked571. I didn't change it in the end, since the first dimension was the same I was expecting, and everything else was aggregated. Having said that, I stopped working on this problem (for the foreseeable at least) , so I didn't fully verify if this actually affected the final result or not.
Is this example available now as part of the BaaL?
I would be interested in Baal for NER as well
We've just moved to a new documentation system (mkdocs) that should help us better structure our tutorials.
If someone could run some experiements showing that BALD is at least better performing than Random on NER, I would include it on our website.
Cheers,
I think we made good progress on this on #262. I'll close this one, but for now the code should work for y'all to run experiment with.
I'm seeking guidance on how to use the BaaL framework for a named entity recognition NLP task. In this task, every training sample is made up of a sequence of tokens and each token has a label. So each sample has not one, but many labels. And the number of labels per sample is not fixed.
Can the BaaL framework deal with this use case? I'm asking because most places in the documentation seem to assume there is one label per training sample.