coderholic / django-cities

Countries and cities of the world for Django projects
MIT License
920 stars 374 forks source link

PostalCode city FK #144

Open caullla opened 7 years ago

caullla commented 7 years ago

Hi,

Are you sure you FK is enough? For example in Germany, cities with same postal code, can exist. I think 2m2 will be better fit. Thanks.

https://github.com/coderholic/django-cities/blob/master/cities/models.py#L181

blag commented 7 years ago

Can you give me an example of two cities with the same postal code?

And assuming that's true, then yes, a M2M relationship would be a better fit. That's not difficult to code, but that would be...involved...to transition all of our users to. Is that something you could look into creating a PR for?

caullla commented 7 years ago

I made inspection, and found cities which are present as separate instances in City model and have same postal_code.

[<City: Westerland>, <City: Sylt-Ost>] # postal_code = 25980
[<City: Klausdorf>, <City: Kiel>] # postal_code = 24147

But now I'm not sure about m2m. Yea, postal_code the same, but those PostalCode records isn't equal for 100%. And until PostalCode.code is not unique in database schema, FK could be enough. So we can have duplicated code with different location or district_name.

What do you think about?

blag commented 7 years ago

I would rather have M2M models between the appropriate models than duplicate data. How far up the hierarchy do we need to switch to M2M (or, to ask it a different way: are those two cities in different Subregions or Regions?

If we really wanted to go wild with it, we would basically model the entire hierarchy as a graph instead of a tree, and then model the entire history of all places (eg: cities in the Ottoman Empire or Austro-Hungary, Berlin belonging to both East and West Germany).

All of that is possible, the real issue is the cities import command. That just gets more convoluted the more data model changes we make. 😖

caullla commented 7 years ago

m2m is not a silver bullet. I would say, best fit looks like this

City <- PostalCode -> Code
class City(models.Model):
     postal_codes = models.ManyToMany(Code, through=PostalCode)

One more question/option. Do we have a polygons for region, subregion, city. We can use geometry instead of graph. Make sense?

blag commented 7 years ago

What data goes in the PostalCode model and what data goes in the Code model?

Right now we import points for cities, districts, and postal codes.

Polygons exist. They are distributed in per-country zip files from Geonames, and we do nothing with them because they would presumably take forever to import and few users have requested them. There was an issue (#125) where somebody was going to add importing polygons.

caullla commented 7 years ago

Code will have only code(currently in PostalCode.code). That's why I don't see reason to add m2m. Right now PostalCode can have duplicated value in code field, but in same time values of other fields isn't equal.

If we find a way to import polygons, we can use them to fill up any relations. Using polygons we can query all PostalCode in Region or all PostalCode in SubRegion. We also can use geometry on import, I don't know how you build relations right now, but with geometry you don't need to check anything, except Point in Polygon. You can build, City -> Region, City-> Country, PostalCode-> City, etc.. And import order doesn't matter.

ps. Assuming that Point is in the middle of Polygon(Point if far enough of Polygon border), it should works even if geometry have not 100% accuracy.

blag commented 7 years ago

Using polygons we can query all PostalCode in Region or all PostalCode in SubRegion

I suspect that would be incredibly slow but I'm willing to try it. Could you code up a PR that implements that?

caullla commented 7 years ago

I'll start with polygon import.