code4hk / legcohk

some notebooks
http://bit.ly/1k58I8b
6 stars 4 forks source link

Name normalization #1

Closed hupili closed 10 years ago

hupili commented 10 years ago

Question: Why "郭偉强" appeared twice in the plot?

Reported for commit: 76b7826cabac4ce9a9f6233f57c92184cdd76d4c Fixed in commit: 8b95798aa51ac512c4eb8d0ed98cbee12e5257c4

Kenneth Chen Wei On: Case of Hon KWOK Wai-Keung's Chinese name is a good case example of data maintenance. When he changed his name from 郭偉強 to 郭偉强 last year, we should have considered the implication of the said change to our data records. Good catch!

Pili Hu: Kenneth Chen Wei On , also 'Dr Joseph LEE' and 'Prof Joseph LEE'. I had this cleansing step in the notebook, http://bit.ly/1k58I8b I think it would be better to officially give the members an ID. Then we do not have data normalization issue. The same for motion ID. Motion is easier because we can use date+order as ID. Name can have some subtle issues.

Refs:

hupili commented 10 years ago

This is a demo. The ingredients I'm looking for: