Open Hironsan opened 4 years ago
Hi! Started working on this one. I am going to also use label metadata in order to get label names. Would that be allright?
I am going to also use label metadata in order to get label names. Would that be allright?
I agree with you. Where and how does the label metadata pass it?
Couple of ideas, but here's what comes in my mind:
Personally, as a user, I would prefer to use class method of each Dataset
directly so instead of using
dataset = read_jsonl(filepath='example.jsonl', dataset=NERDataset, encoding='utf-8')
I would suggest to directly use
dataset = NERDataset.from_jsonl(filepath='example.jsonl', encoding='utf-8')
and when it comes to TextClassificationDataset
(working name), I would just add another optional argument (via **kwargs) ...
dataset = TextClassificationDataset.from_jsonl(annotations_filepath='example.jsonl', labels_filepath='project_1_labels.jsonl', encoding='utf-8)
...optional because without the label metadata filepath, annotations could be still converted with appended label id (and warning for information) like that: __label__1
although I am not sure this is a valid fasttext
label (have to check that)
If you decide to stay with the current implementation, labels path could be passed either as **kwargs
to read_jsonl
function and passed further to Dataset constructor or passed directly to TextClassificationDataset.to_fasttext
method (yes, this requires reading label metadata every time you want to perform a conversion so I am not a fan of this solution)
Let me know what you think
Example: