dr5hn / countries-states-cities-database

🌍 Discover our global repository of countries, states, and cities! 🏙️ Get comprehensive data in JSON, SQL, PSQL, XML, YAML, and CSV formats. Access ISO2, ISO3 codes, country code, capital, native language, timezones (for countries), and more. #countries #states #cities
https://dr5hn.github.io/countries-states-cities-database/
Open Data Commons Open Database License v1.0
7.56k stars 2.61k forks source link

Incoherence between the data across tables #822

Open dugzino opened 3 months ago

dugzino commented 3 months ago

Issue

There's a big issue with the data. The table cities contains city and town data. The table states contains state and city data. And the link between them are wrong.

Found issues in Belgium, Albania, & Japan. So, there probably are in other countries.

Examples

Example 1

For instance in Belgium:

So, the data for Belgium is completely messed up. Pretty much 90% of it and even fixing the bad IDs won't fix the other issue which is lack of tables and fed data.

Example 2

This time in Albania. We have duplicates in states like "Tirana County" and "Tirana District". "Tirana" (or "Tirana district", but it's really just "Tirana") is a city in the state of "Tirana County". Even though we never make use of the state "Tirana District" it's still there. We then have "Kavajë District" in states when it's actually the city "Kavajë" and should be in cities. Also "Bashkia Kavajë" is not even a town. It's even lower level. It's a place in a town.

Solution

The solution would be a whole refactor of the tables and data. Which is huge, but necessary. It should be like:

continents(
  name: string,
);

regions(
  name: string,
  continent_id: Continent,
);

countries(
  name: string,
  region_id: Region,
);

states(
  name: string,
  country_id: Country,
);

cities(
  name: string,
  country_id: Country,
  state_id: State | null,
);

towns(
  name: string,
  city_id: City,
);

places(
  name: string,
  town_id: Town,
);

Doing this would totally make things a lot more simpler and cleaner to add, edit, and more importantly make use of it. I wouldn't mind doing a POC, but I got my hands full right now. This should become the next version of this repo.

dosubot[bot] commented 3 months ago

Hey @dugzino! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let me know how I can assist you!

The issue you've identified highlights significant inconsistencies in the data structure of the countries-states-cities-database. Specifically, the misclassification of cities and towns in the states and cities tables, and the incorrect links between them.

Current Issues:

  1. Misclassification:

    • Belgium: "Antwerp" is incorrectly listed as a state instead of a city. "Borgerhout" is a town in the city of "Antwerp" but is incorrectly linked to the state "Flanders".
    • Albania: Duplicates like "Tirana County" and "Tirana District". "Kavajë District" should be a city, not a state. "Bashkia Kavajë" is a place within a town, not a town itself.
  2. Incorrect Links:

    • Cities and towns are linked to incorrect parent states or regions.

Proposed Solution:

A comprehensive refactor of the database schema to introduce more granular tables and correct the hierarchical relationships. The proposed schema is:

continents(
  name: string,
);

regions(
  name: string,
  continent_id: Continent,
);

countries(
  name: string,
  region_id: Region,
);

states(
  name: string,
  country_id: Country,
);

cities(
  name: string,
  country_id: Country,
  state_id: State | null,
);

towns(
  name: string,
  city_id: City,
);

places(
  name: string,
  town_id: Town,
);

This schema would ensure a clear and accurate representation of geographical entities and their relationships, making the data easier to manage and use.

Next Steps:

This refactor would address the current issues and provide a more robust structure for future data additions and modifications. If you need further assistance or a detailed plan for the refactor, feel free to ask!

To continue talking to Dosu, mention @dosu.