bcgov / cthub

Apache License 2.0
2 stars 5 forks source link

CTHUB - GER/SUVI Consistent prepositions and articles #316

Closed katerinkus closed 5 months ago

katerinkus commented 5 months ago

Describe the task Fix formatting issues in Applicant Name and Manufacturer column values in the GER/SUVI dataset.

When "BC" is used in the context of a name, it should be "BC", e.g. "BC Cider". But when it is next to the word "Ltd."), it should be "B.C. Ltd.". "And" should be "and", and "Of" should be "of" and "The" should be "the" (same for "A" and "An").

Purpose This task fixes formatting inconsistencies that are a result of changing column values to be sentence case or upper case. For instance, Bc Cider And Juice would become BC Cider and Juice, The Best Company Of 2020 Bc Ltd. would become The Best Company of 2020 B.C. Ltd..

Acceptance Criteria this code is in the jupyter file:

Write our own code for these ones:

Additional context Please see GER_notes_for Agile_team, section 1.3 for some regex ideas.

There are ~150 prepositions (e.g. under, at), etc. so if would be too difficult to do all of them without knowing the context. That is why preposition "of", articles and "and" is enough.

ArawuSamuel1 commented 5 months ago

Thank you @katerinkus This is pretty clear even as a non dev i can understand what to want done in this task.

We would discuss more in refinement to see of the devs have any questions.

ArawuSamuel1 commented 5 months ago

Hey team! Please add your planning poker estimate with Zenhub @emi-hi @tim738745