OpenBudget / BudgetKey

Opening the Israeli Budget!
https://next.obudget.org
48 stars 15 forks source link

Maya Parser: Data Cleansing: Academic Records #471

Open odedsh opened 4 years ago

odedsh commented 4 years ago

project: budgetkey-data-pipelines pipeline: maya/maya-reported-academic-degrees

The existing pipeline reads the maya stock exchange notifications that occur when new board members and other company officers are appointed. The Academic record reported in these forms is text based and it is placed in the columns 'Degree', 'Field' &'Institution'.

Task:

We want to build rules that will cleanup typos and create standards for these fields

For instance in Institution remove duplication in: 'אוניברסיטת תל אביב' & 'אוני תל אביב' there can also be typos which we want to remove

Same should be done on 'Degree' and 'Field'

Notice that sometimes users put the type of the degree in the 'Field' מנהל ומדיניות ציבורית ( M.A )

The MA is not part of the field and should be removed.