Open dugzino opened 3 months ago
Hey @dugzino! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let me know how I can assist you!
The issue you've identified highlights significant inconsistencies in the data structure of the countries-states-cities-database
. Specifically, the misclassification of cities and towns in the states
and cities
tables, and the incorrect links between them.
Misclassification:
Incorrect Links:
A comprehensive refactor of the database schema to introduce more granular tables and correct the hierarchical relationships. The proposed schema is:
continents(
name: string,
);
regions(
name: string,
continent_id: Continent,
);
countries(
name: string,
region_id: Region,
);
states(
name: string,
country_id: Country,
);
cities(
name: string,
country_id: Country,
state_id: State | null,
);
towns(
name: string,
city_id: City,
);
places(
name: string,
town_id: Town,
);
This schema would ensure a clear and accurate representation of geographical entities and their relationships, making the data easier to manage and use.
This refactor would address the current issues and provide a more robust structure for future data additions and modifications. If you need further assistance or a detailed plan for the refactor, feel free to ask!
To continue talking to Dosu, mention @dosu.
Issue
There's a big issue with the data. The table
cities
contains city and town data. The tablestates
contains state and city data. And the link between them are wrong.Found issues in Belgium, Albania, & Japan. So, there probably are in other countries.
Examples
Example 1
For instance in Belgium:
states
you can find "Antwerp". Which is incorrect as "Antwerp" is a city.cities
you can find "Antwerp" and linked to it's parent "Flanders". Which is correct.cities
you can find "Borgerhout" which is linked to the state (region) "Flanders". Which is incorrect. "Borgerhout" is a town in the city of "Antwerp".So, the data for Belgium is completely messed up. Pretty much 90% of it and even fixing the bad IDs won't fix the other issue which is lack of tables and fed data.
Example 2
This time in Albania. We have duplicates in
states
like "Tirana County" and "Tirana District". "Tirana" (or "Tirana district", but it's really just "Tirana") is a city in the state of "Tirana County". Even though we never make use of the state "Tirana District" it's still there. We then have "Kavajë District" instates
when it's actually the city "Kavajë" and should be incities
. Also "Bashkia Kavajë" is not even a town. It's even lower level. It's a place in a town.Solution
The solution would be a whole refactor of the tables and data. Which is huge, but necessary. It should be like:
Doing this would totally make things a lot more simpler and cleaner to add, edit, and more importantly make use of it. I wouldn't mind doing a POC, but I got my hands full right now. This should become the next version of this repo.