CanCLID / rime-cantonese-upstream

rime-cantonese 上游詞表倉庫
Creative Commons Attribution 4.0 International
27 stars 9 forks source link

Create find_duplicates.py #15

Closed AlienKevin closed 1 year ago

AlienKevin commented 2 years ago

Finds all duplicated lines within the CSV files. Aims to close issue 159 in rime-cantonese

AlienKevin commented 2 years ago

@ayaka14732 I added find_duplicates to the workflows. Though I'm not sure about how the workflow actually runs. Currently, I changed the script to output error messages to stdout, like validate.py. There are 859 duplicated lines right now so my console is overflown when I run the script. You might need to redirect the messages to some log file for inspection.

ayaka14732 commented 2 years ago

@AlienKevin If there exists duplicate entries in the dictionary, the script should terminate with a non-zero exit code, so that we will get a notification saying that the workflow run has failed. @laubonghaudoi We should remove the duplicate entries first, then merge the changed dictionaries back to this branch.

laubonghaudoi commented 2 years ago

@AlienKevin @ayaka14732 啱先刪咗重複詞條,你哋可以繼續嘞 https://github.com/CanCLID/rime-cantonese-upstream/commit/a80ee507406660331e6e7f9d0c79b0ef4db36ed5

AlienKevin commented 2 years ago

@ayaka14732 I updated the script to exit(1) if there are duplicates and print all error messages to stderr.

ayaka14732 commented 2 years ago

Why has the check passed? I thought https://github.com/CanCLID/rime-cantonese-upstream/commit/a80ee507406660331e6e7f9d0c79b0ef4db36ed5 has not been merged to this branch yet.

laubonghaudoi commented 2 years ago

係唔係 @AlienKevin 已經將最新版 pull 咗落呢個 PR

AlienKevin commented 2 years ago

呢個都幾奇怪,我冇pull過。會唔會checkout個action係自動checkout main分支嘅最新版本? See https://github.com/actions/checkout#usage

graphemecluster commented 2 years ago

@ayaka14732 綾香你會唔會偏好用 defaultdict

ayaka14732 commented 2 years ago

@ayaka14732 綾香你會唔會偏好用 defaultdict

Oh, I didn't notice. Should use defaultdict instead.

AlienKevin commented 2 years ago

Didn't know defaultdict exists. Seems like a neat feature. Updated to use defaultdict instead of dict.

graphemecluster commented 2 years ago

@AlienKevin You can pass the list constructor directly to the defaultdict constructor, I think it is the more common practice

AlienKevin commented 2 years ago

@graphemecluster Got it. @ayaka14732 Just applied your suggestions as a new commit.