chubin / wttr.in

:partly_sunny: The right way to check the weather
https://wttr.in
Apache License 2.0
24.36k stars 1.1k forks source link

JSON `nearest_area` is one city over #523

Open pragma- opened 4 years ago

pragma- commented 4 years ago

Todo

Details

A while ago I submitted a PR to add includeLocation to the weather query in order to get nearest_area in the JSON result. This had been working flawlessly and fantastically up until about a month ago or so.

Now the areaName field is consistently one or two cities away instead of the city being queried. For example, los angeles gives results for hillgrove, california. new york city gives oakland gardens, new york. london, uk gives cubitt town, tower hamlets, greater london, united kingdom. A month ago all of these used to return the expected city.

This is disconcerting because I cannot tell if WWO (wttr's weather backend) simply no longer has access to weather stations in those cities or if its database look-up is broken in some way or if the nearest_area has been changed to do something else now.

Do you know why this is happening?

By the way, I was looking over the PRs and I saw that somebody had recently submitted a PR to use an IP-location service because the WWO location results have been very inaccurate lately. Instead, I hope we can resolve the WWO location result issue to be accurate once again.

ScientiaEtVeritas commented 3 years ago

I can confirm this behaviour. Searching for some larger German cities like München returns Strasslach-Dingharting, some village in the vague surrounding area, Karlsruhe returns Neuburg Am Rhein, same thing.

chubin commented 3 years ago

Yes, I can confirm this too. I think that the problem is that how WWO handles it. I see that it is very inaccurate now. I should report this problem to them, and if would not be fixed/mitigated we should probably look for a new data provider.

Internal wttr.in location resolution works perfectly well:

$ curl http://localhost:8004/Nuremberg
{"latitude": 49.453872, "timezone": "Europe/Berlin", "longitude": 11.077298, "address": "Nürnberg, Mittelfranken, Bayern, Deutschland"}
$ curl http://localhost:8004/49.45,11.08
{"address": "11, Theatergasse, Altstadt, St. Lorenz, Nürnberg, Bayern, 90402, Deutschland", "latitude": 49.4498919, "longitude": 11.0801459, "timezone": "Europe/Berlin"}

But at the same time:

$ curl wttr.in/Nuremberg?format=j1 | jq .nearest_area[0]
{
  "areaName": [
    {
      "value": "Aurau"
    }
  ],
  "country": [
    {
      "value": "Germany"
    }
  ],
  "latitude": "49.250",
  "longitude": "11.017",
  "population": "0",
  "region": [
    {
      "value": "Bayern"
    }
  ],
  "weatherUrl": [
    {
      "value": ""
    }
  ]
}

We can override the nearest_area field of WWO with the wttr.in data, but the real question is that perhaps WWO returns the data for the nearest_area instead of the area in the query (which would be really bad)

pragma- commented 3 years ago

We can override the nearest_area field of WWO with the wttr.in data, but the real question is that perhaps WWO returns the data for the nearest_area instead of the area in the query (which would be really bad)

It does look like the weather data is indeed "accurate" for the nearest_area. The problem is that it's not the location we searched for.

chubin commented 3 years ago

@pragma- I think the only real solution for this problem is to add support for other upstream data sources. We have initial support of a new data source in #532; I believe more will follow; then we will have a robust solution, and until that we will be always dependent on the single data source

Danfro commented 3 years ago

Could this be also/additionally due to a service rounding coordinates?

When I use http://wttr.in/51.4976,20-0.1181 (central London), I get the following search result: Ort: Lambeth Palace Garden, Lambeth Palace Road, Lambeth, London Borough of Lambeth, London, Greater London, England, SE1 7JU, United Kingdom [51.49704725,-0.11875235545073382]

If I do the same search with JSON format like this http://wttr.in/51.4976,%20-0.1181?format=j1, I do get a different output:

grafik

Mark the request coordinates being only two decimals.

enigma9o7 commented 3 years ago

I use the forecast module with Bodhi Linux which uses this as a backend, and have the same issue. If I set it to San Jose, California it comes up with Coyote, someplace in the remote surrounding area. I tried entering other various city names around me and Cupertino came up with Austin which is a little bit closer, but no way to get it to actual San Jose that I have found.
Ideally I would enter a postal/zip code.... but even if I had to enter latitutde/longtitude that would be fine... but city name is not working quite right.

Both of those locations (Coyote or Austin) are small obscure places I had not heard of, and had to use google maps to even find them. I reported to Bodhi developers but they pointed me here as an upstream problem, and seems it is affecting others in similar manner, when I read "village in remote surrounding area" for the user near Munich I thought to myself "yep, exactly!".

chubin commented 3 years ago

I've localized the bug pretty well now. As I already wrote before, it is in the data source. I hope they will fix it, because it is a real bug, affecting all their (commercial) customers. If they will not fix it, I have an idea of a workaround, and if it will not help either, the only solution will be to change the data source.

Just for the clarity: it is not a bug in wttr.in!

chubin commented 3 years ago

@pragma- @ScientiaEtVeritas @Danfro @enigma9o7

I believe it is fixed now. Could please check if it works for you?

ScientiaEtVeritas commented 3 years ago

@chubin It doesn't seem fixed for me. I'm using this endpoint: http://wttr.in/Karlsruhe?format=j1. Thank you for looking into this issue!

chubin commented 3 years ago

@ScientiaEtVeritas Doch,

at least it seems to work for me (with Karlsruhe too):

$ curl -ks wttr.in/Karlsruhe?format=j1\&nonce=$RANDOM | jq -r .nearest_area[0].areaName[0].value
Carlsruhe

I added here nonce=, to bypass the caching layer (shouldn't be done usually, because it generates additional useless load, but ok in this case; as soon as the cache entries are expired, it will be not needed here too)

enigma9o7 commented 3 years ago

@pragma- @ScientiaEtVeritas @Danfro @enigma9o7

I believe it is fixed now. Could please check if it works for you?

It does! This is excellent! Thank you so much!

chubin commented 3 years ago

I think the bug is fixed; let's wait for at least one additional acknowledgment (@ScientiaEtVeritas from Fabian maybe?) and close it

Danfro commented 3 years ago

Please ignore me if I just don't remember a detail of how different result formats work. But searching for say Leipzig using general search returns Leipzig as result. Fine. But using json format does return Stunz, a part of Leipzig. Is that intended? Should both return the same result = Leipzig?

Please compare those two querys:

http://wttr.in/leipzig?format=j1

http://wttr.in/leipzig

Doing the same for München returns München and Gern (a part of München).

chubin commented 3 years ago

Yes, that's true, but the discrepancy shouldn't be too big (if at all). There are some locations indeed (Leipzig is one of them) where reverse GPS resolution (GPS -> Name) returns a little bit different result than the direct resolution (Name -> GPS). As far as I can understand, this comes from the caching mechanisms that are used on the data source side; we can't influence it directly.

As long as it is only slightly off, I think the error can be ignored. It it will influence the forecast results, we will need to search for some solution

pragma- commented 3 years ago

@chubin The nearest_area field does appear to be now be populated with more-accurate values, for the most part.

Previously, I was always getting city names that were one city away or so (e.g. "los angeles, california" would display "hillgrove, california"). Every time, consistently. Now I get the expected city information most of the time.

There are still some queries that do not have the expected city name; i.e. "Manhattan, New York" gives "Clason Point, New York" -- which seems to be just slightly outside of Manhattan, according to Google Maps. "Bronx, New York" gives "West Farms, New York".

It is my understanding that the data source gets information about the nearest weather station to a query. It may not always be possible to have a weather station in the exact location. That could explain why it says "Clason Point" and "West Farms" instead of the queried city name.

As long as the nearest_area field is accurately representing the correct weather station, I am fine with discrepancy between the queried location name and the weather results location name. As far as I can tell, the nearest_area field is much less broken now. The New York results make me hesitate on saying that it is 100% fixed.

pragma- commented 3 years ago

Noticed something weird.

If I query for "Bronx" I get "Baychester, New York" with a Lat/Long of 40.86 and -73.84.

If I query for "Bronx, New York" I get "West Farms, New York" with a Lat/Long of 40.85 and -73.88.

Do you know why this happens? I would expect "Bronx" and "Bronx, New York" to both use the same weather station.

chubin commented 3 years ago

Yes, it happens because that is how the location resolution procedure works:

You can query any other location, and check how it will be resolved, like this:

$ curl wttr.in/~Bronx,New+York | grep ^Location:

This problem (if it is a problem) is not related to the original one, and it is not related to weather data, it happens one step earlier. That's just like geo location system works, and I don't see here a big problem. The same could happen if you would search for a location in Google Maps or Apple Maps or whreever.

The original problem was a real problem though. It is not really because of weather station locations, because the data of the stations is getting postprocessed, interpolated etc, but it is still a bug (or caching issue) on the data source level. We can't influence it directly, but as I said, if the problem (at its older scale) reoccurs, we will search for some solution

pragma- commented 3 years ago

You can query any other location, and check how it will be resolved, like this: $ curl wttr.in/~Bronx,New+York | grep ^Location:

This indeed does say "The Bronx, Bronx County, New York" as expected! This is what I was expecting the nearest_area field to accomplish.

Instead, today, using curl wttr.in/~Bronx,New+York?format=j1, we have yet another new location name for "Bronx, New York"! It is now saying "Morrisania, New York". I cannot use the nearest_area field to display the names of the locations because they seem to be confusing and inconsistent locations: Baychester, West Farms, Morrisania.

The nearest_area field does seem to be much more accurate now, but it does not give a consistent location name for some locations. Would it be possible to add a location field to the JSON (format=j1) results that will use the Location: data from the "normal" results (curl wttr.in/~Bronx,New+York | grep ^Location:)?

chubin commented 3 years ago

Yes, it is a good idea; probably we should just add something like queried_location to the JSON response; keep in mind though that the data is provided for the Lat/Long pair in the response, not the lat/long pair in the query! I understand that it sound strange, but that's how the caching of our data provider works, and it does not look like that they are going to fix it. Ans as I said, the shift is not so big now, much better than before

pragma- commented 3 years ago

queried_location sounds great. Should I go ahead and close this issue and open a new issue for queried_location or do you want to keep this one open?

chubin commented 3 years ago

No, you shouldn't; I a going to work on it as a part of this issue. I already extended the original description with this step