Open cristan opened 1 month ago
chatgpt suggests simply to drop duplicates :D , will see after other PR are discussed (@gradedSystem)
# Collect alias rows in a list
alias_list = []
for index, row in unlocode_df.iterrows():
if pd.isna(row['Location']) or row['Location'] == '':
if row['Change'] == '=': # alias row
alias_list.append(row[['Country', 'Name', 'NameWoDiacritics']])
# Create alias_df from the list
alias_df = pd.DataFrame(alias_list, columns=['Country', 'Name', 'NameWoDiacritics'])
alias_df.drop_duplicates(inplace=True)
# Save the alias DataFrame to CSV
alias_df.to_csv(f"data/alias.csv", index=False)
@sabas what if we just do something like this (using simple regex operator):
GL,Christianshaab, Qasigiannguit
wdyt?
Check out https://github.com/datasets/un-locode/blob/main/data/alias.csv
Let's take the first line:
That's there twice (also at line 88). This applies to all the lines I've checked.