geocoder-php / Geocoder

The most featured Geocoder library written in PHP.
https://geocoder-php.org
MIT License
3.94k stars 517 forks source link

Nominatim geocoder - returns different results for postalcode before vs after city name #1114

Open webprogrammierer opened 3 years ago

webprogrammierer commented 3 years ago

Maybe this is not an issue of geocode-php/nominatim-provider , it maybe an issue of Nominatim itself:

https://github.com/osm-search/Nominatim/issues/2167

But Nominativ has closed the issue instead of solving the problem.

Please check the problem and example on https://github.com/osm-search/Nominatim/issues/2167 and help to solve this problem.

Thank you.

mtmail commented 3 years ago

Nominatim hasn't closed the issue, we locked further comments.

webprogrammierer commented 3 years ago

It is not good to lock further comments. So people see you are not interested in solving it.

mtmail commented 3 years ago

We have received all information needed to investigate the issue. We will update the issue when there is progress.

jbelien commented 3 years ago

Hello @webprogrammierer, thanks for your "issue" but we do not tweak or change results returned by external APIs ; we just process the result returned by the API.

You will need to wait for this "issue" to be "fixed" in Nominatim API itself.

jbelien commented 3 years ago

That being said address parsing is indeed REALLY difficult.

In 9400 Wolfsberg AT or Wolfsberg 9400 AT, the 9400 could be a postal code or a house number ; I'm guessing that's (part of) the reason the result is different.

webprogrammierer commented 3 years ago

House number 9400? Very funny.

9400 is the right postal code for Wolfsberg. So there is no reason to doubt, no reason for the software to do a second query.

webprogrammierer commented 3 years ago

Hello @webprogrammierer, thanks for your "issue" but we do not tweak or change results returned by external APIs ; we just process the result returned by the API.

You will need to wait for this "issue" to be "fixed" in Nominatim API itself.

I think you did not understand the problem and how it could be solved. There is NO NEED to change results! The only thing to be done is to make correct queries:

You can use the right syntax for the query. The right syntax is: postalcode after city.

Instead of the free-form queries you can also use "Alternative query string format split into several parameters for structured requests", see here: https://nominatim.org/release-docs/develop/api/Search/

So you have 2 ways to solve this. That is done in a few minutes and thousands of users will have a solution for a huge problem!

Otherwise if you do not solve the problem, users will use Google API geocoder or other geocoders instead.

So PLEASE rewrite the code and use the correct syntax for Nominatim search queries! So that it will work in all cases not only in some case as it does now.

jbelien commented 3 years ago

House number 9400? Very funny.

Maybe not in Austria but the goal of Nominatim is to work globally ...

9400 is the right postal code for Wolfsberg. So there is no reason to doubt, no reason for the software to do a second query.

You know that because you're a human with knowledge of postal code in Autria. Nominatim has to parse a completely random string and "guess" that a number is (or is not) a postal code ; definitely not the same process.

The right syntax is: postalcode after city.

Where does this assessment come from ? Is there anywhere in Nominatim documentation stating that postal code should go after the city ?

Instead of the free-form queries you can also use "Alternative query string format split into several parameters for structured requests", see here: https://nominatim.org/release-docs/develop/api/Search/

I'm well aware of the structured requests. I don't have time to improve Nominatim provider at the moment but you're more than welcome to submit a Pull Request.

Otherwise if you do not solve the problem, users will use Google API geocoder or other geocoders instead.

Even though at a personal level (as OSM contributor) I do understand what you're saying, the goal of this project is to provide an easy way to use ANY geocoding API (including Google's API).

webprogrammierer commented 3 years ago

Where does this assessment come from ?

https://nominatim.org/release-docs/develop/api/Search/

street=<housenumber> <streetname>
city=<city>
county=<county>
state=<state>
country=<country>
postalcode=<postalcode>

Postalcode is the last value in the list! You have to keep the order if using a simple query q?=. I takes less than 5 minutes to find that out (order is important if the search is used for geocoding! or use structured query). Test it out and you will be convinced.

... provide an easy way to use ANY geocoding API

The changes in code should be done only here: https://github.com/geocoder-php/nominatim-provider where it is not possible to create issues. Therefore the issue was created here.

If you make the changes in Nominatim this will not concern Google API. Why do you write such things if everyone can see how it is.

If there is a problem that concerns millions of people in the world you should be able to find somebody to do this adaptions.

mtmail commented 3 years ago

Postalcode is the last value in the list

In the Nominatim API documentation those are names of URL parameters. They can be in any order. Postalcode being last in the list has no significance. In the value "housenumber streetname" the order can be "streetname housenumber", too.

Nominatim geocoder - returns different results for postalcode before vs after city name - please use correct search syntax

The issue is Nominatim software not interpreting the free form text (address) correctly. I think the git issue here can be closed, it's filed in https://github.com/osm-search/Nominatim/issues/2167 and it will be solved eventually.

But not within the 3 days you mentioned you need a solution. If you see another geocoder API interpreting the address better, every software uses different approaches, complexity and data, it's a good approach to use that geocoder API instead in the meantime.

jbelien commented 3 years ago

In the Nominatim API documentation those are names of URL parameters. They can be in any order. Postalcode being last in the list has no significance. In the value "housenumber streetname" the order can be "streetname housenumber", too.

Thanks @mtmail for confirming what I thought. 👍


@webprogrammierer Please remember that people working on Nominatim or on this project are volunteers. They usually do the work on their free time. If your problem do not find a solution quick enough, I suggest you submit a PR here to add support for structured requests.

You can also use any other provider ; if you need to use OSM data, there are other providers based on OSM data (Geocode Earth, GraphHopper, Mapbox, OpenCage, Photon, ...).

On a more personal note, using a tone in your message that is a bit less aggressive will probably make people more eager to help you.

webprogrammierer commented 3 years ago

In the Nominatim API documentation those are names of URL parameters. They can be in any order. Postalcode being last in the list has no significance. In the value "housenumber streetname" the order can be "streetname housenumber", too.

It would be great if YOU could tell the programmers of Nomnatim API to adapt the documentation, because we all (you and I) found out, the order of postalcode and city must be city -> postalcode.

I do not understand, why you do not tell all people here, that you will solve the problem within 20 days instead of 3 days. This would be a great beginning and a great outlook for all who want correct coordinates using Nomnatim.

webprogrammierer commented 3 years ago

There are about a dozons reactions and comments on this problem till now in two issue. The only thing that has been written is rejection.

Please let us look further and tell us where and when this huge problem will be solved OR who from all the programmers will feel responsible to pursue the problem until it is resolved.

Rejection is not the thing that should be posted here in such a case. Both, Nominatim and Nomanitim geocoder are not usable until this problem is solved. There are a lot of countries in the world where postalcode is written before city name in any address! Millions and millions of people are used the write any (!) address in this order .... postalcode city ... This means all this millions of people get wrong results if the use theer common and normal order.

If that is know a solution is indispensible.

A quick and every working solution could be to use the structured search. Can you tell us (all who need the solution and all who read this issue, and so on ...) where in which file of geocoder-php/nominatim we have to change a few code lines to use structered search. So we can write a quick patch for the problem and continue to use Nomatim waiting for the official solution to be implemented.

Please answer positive. Thank you very much.

jbelien commented 3 years ago

Can you tell us (all who need the solution and all who read this issue, and so on ...) where in which file of geocoder-php/nominatim we have to change a few code lines to use structered search. So we can write a quick patch for the problem and continue to use Nomatim waiting for the official solution to be implemented.

The code of the Nominatim provider is located here : https://github.com/geocoder-php/Geocoder/tree/master/src/Provider/Nominatim and the main file is Nominatim.php.

webprogrammierer commented 3 years ago

Thank you.

On 84: public function geocodeQuery(GeocodeQuery $query): Collection {

in the $query object the private $data value is empty ($query->__toString() or ->getAllData() ). So it is not possible to get the values of postalcode, city and the other address parts. $query only contains the search string in the private $text variable (->getText() ).

But we need all the parts of the address in separate variables to rewrite search and us the structured search.

Why do these address parts not exist in the query object? So we need to look for places in the code where we have access to all parts of address in separate variables.

jbelien commented 3 years ago

Please dive in our code a little bit more (or check how it's done in other providers), we have the GeocodeQuery::withData() function exactly for that purpose: https://github.com/geocoder-php/Geocoder/blob/master/src/Common/Query/GeocodeQuery.php#L131

webprogrammierer commented 3 years ago

Maybe you can write the patch for structured Nominatim search. You know the code very well.