Closed katerinkus closed 5 months ago
Thank you @katerinkus This is pretty clear even as a non dev i can understand what to want done in this task.
We would discuss more in refinement to see of the devs have any questions.
Hey team! Please add your planning poker estimate with Zenhub @emi-hi @tim738745
Describe the task Fix formatting issues in
Applicant Name
andManufacturer
column values in the GER/SUVI dataset.When "BC" is used in the context of a name, it should be "BC", e.g. "BC Cider". But when it is next to the word "Ltd."), it should be "B.C. Ltd.". "And" should be "and", and "Of" should be "of" and "The" should be "the" (same for "A" and "An").
Purpose This task fixes formatting inconsistencies that are a result of changing column values to be sentence case or upper case. For instance,
Bc Cider And Juice
would becomeBC Cider and Juice
,The Best Company Of 2020 Bc Ltd.
would becomeThe Best Company of 2020 B.C. Ltd.
.Acceptance Criteria this code is in the jupyter file:
Bc Ltd.
becomesB.C. Ltd.
Bc
becomesBC
Of
becomesof
if it is not the first word in the string.And
becomesand
if it is not the first word in the string.Write our own code for these ones:
The
becomesthe
if it is not the first word in the string.A
becomesa
if it is not the first word in the string.An
becomesan
if it is not the first word in the string.Additional context Please see GER_notes_for Agile_team, section 1.3 for some regex ideas.
There are ~150 prepositions (e.g. under, at), etc. so if would be too difficult to do all of them without knowing the context. That is why preposition "of", articles and "and" is enough.