buerokratt / Data-Anonymizer

MIT License
1 stars 4 forks source link

Label existing new Bürokratt corpus #70

Open kunnark opened 1 year ago

kunnark commented 1 year ago

AS A Data Scientist I WANT TO use a labelled new dataset IN ORDER TO start training new BERT models on that.

Acceptance Criteria:

Additional: Labeling classes:

PER - person names
GPE - geopolitical entities
LOC - geographical locations
ORG - organizations
PROD - products, things, works of art
EVENT - events
DATE - dates
TIME - times
TITLE - titles and professions
MONEY - monetary expressions
PERCENT - percentages
DOC_ORG - id of organisation document
CARD - banking or similar card number
IBAN - IBAN account number
DOC_PER - Personal document number
IDCODE - personal ID code
EMAIL - email address
TEL - phone number

Task covered with functionalities to add additional corporas and functionality to prelabel new corporas #41

turnerrainer commented 1 year ago

@alimuhammadahmer @kunnark

  1. This issue is marked as "Done" but I can not see any code commits;
  2. ACs must contain all REST endpoints used to get the result;
  3. All technical ACs are missing.
kunnark commented 1 year ago

@turnerrainer

  1. This issue is marked as "Done" but I can not see any code commits;
  2. ACs must contain all REST endpoints used to get the result;
  3. All technical ACs are missing.
  1. Status changed
  2. No REST endpoints in use, this is DS work that is done to get the corpus from the training.
  3. The technical task for the labeler was to label a dataset.
turnerrainer commented 1 year ago

@vmugra please verify if the AC of this issue is met.

vmugra commented 1 year ago

Task covered with functionalities to add additional corporas and functionality to prelabel new corporas https://github.com/buerokratt/Data-Anonymizer/issues/41