Excessive amount of error 500

pnbruckner commented 6 years ago

I'm using this library (v2.0.11) via Home Assistant. My log sees a LOT of error messages that look like this:

Unable to parse response :<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 500 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>500.</b> <ins>That’s an error.</ins>
  <p>The server encountered an error and could not complete your request.<p>If the problem persists, please <A HREF="http://www.google.com/support/">report</A> your problem and mention this error message and the query that caused it.  <ins>That’s all we know.</ins>
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.5/site-packages/locationsharinglib/locationsharinglib.py", line 389, in _get_data
    data = json.loads(response.text.split("'", 1)[1])
IndexError: list index out of range

They seem to come in bunches. It would be really nice to at least more gracefully handle this exception, like write a small error message to the log that says something like the server could not satisfy the query or something. As it is it looks like the code doesn't expect this to happen.

It would be even nicer if it could filter out some of the consecutive messages. I did this with one of my HA components. I.e., if the same error happens twice in a row, say in the error message that further errors will not be shown until the errors stop. Then when the query works again, write a message saying things are ok again (maybe even giving a count of how many consecutive errors happened.)

Bottom line, though, is my log gets flooded with these messages and which makes it hard to see other problems that might be happening.

reaper7 commented 6 years ago

I have the same problem with v2.0.11 on HA :( https://github.com/home-assistant/home-assistant/issues/15076

pnbruckner commented 6 years ago

As an example, between 21:17:14 yesterday and 07:52:39 this morning (about 10.6 hours) there were 51 errors like this in my log, for an average of about one error every 12.5 minutes. Looking more closely only once did they happen twice in a row.

So maybe a better idea to avoid flooding the log is to 1) make this exact error a warning instead. And then keep statistics about how often it's happening, and if the frequency becomes too great, then write an error to the log. This way if someone really cares (and by default) all the warnings will show. But if the logging level is set to error for locationsharinglib.Service then only when the situation gets worse than usual will an error show up in the log.

Of course, it might be best to figure out why these errors are happening in the first place and fix them. But something like I describe above might be helpful in the meantime. I'm going to try implementing something along those lines.

ve6rah commented 6 years ago

I think this really requires more than simply hiding these errors.

These errors are causing me massive delays in updates to the location. If I use Google Maps and check the location, it is a maximum of maybe 5-10 minutes old, however using locationsharinglib locations are often an hour old or more. This is while my log fills up with this error message. Previously (before my log starting filling with this error) the update time on locationsharinglib was quite reasonable.

Without a fix for the underlying issue, the usability of this module is severely compromised. Hiding the error message won't fix that. (Though I will admit that the library should ALSO handle the error more gracefully, a single line in the log is more than enough)

pnbruckner commented 6 years ago

First, I agree that if there's something that can be done to make locationsharinglib retrieve data from the server more consistently and reliably, then that should be the priority.

But, still, I think there are two separate issues. 1) locationsharinglib seeing a lot more "Error 500" responses than it used to. And 2) the data on Google's server being more stale than it used to.

The first, I suppose, can lead to lagging updates. But in the short time I've been monitoring this (a week or so), although sometimes there are bursts of these errors, they don't happen every time. The HA device tracker platform code calls this library once every 30 seconds, yet I don't see two errors every minute. So there are plenty of times where the communication works, even during a period where there are a lot of these messages. So that, in itself, cannot explain less frequent updates than previously.

And this is where the second issue comes up. Is there something locationsharinglib can do about this? I tend to doubt it, but I really don't know.

FWIW, I have two devices I can track via both Google Maps and Life360. Right now they both are showing long last update times on Google Maps: a half hour on one, and two hours on the other. Yet Life360 updated their locations multiple times in that period, with the latest being only a few minutes old. (And I'm talking about viewing this data via the Google Maps and Life360 websites - not HA. And, FYI, one is an iPhone and the other is an Android phone.)

I actually use both Google Maps and Life360 to track one of these devices in HA. (See this HA community topic for more details.) I've found that sometimes Life360 updates better, and sometimes Google Maps updates better, and sometimes they both update pretty well, filling in small gaps for each other.

So, the bottom line for me is, yes, if there's a bug that needs fixing, but all means, that should come first. But, even so, it seems the Google Maps server will quite often respond with Error 500, even while other queries during the same period work just fine. And sometimes these errors happen a lot. So regardless, better handling of the errors, which seem to be inevitable, needs to be implemented.

costastf commented 6 years ago

Hi all,I am currently on vacation and will not be able to work on this until after the 24th. Handling this error better is something that will be done. The library was written in such a way as to force this kind of feedback since it is quite exploratory. The 500 errors could be made into warnings but I don't think that we should go implementing a state machine for the errors so we rate limit them. I think that specifically the 500 errors sound be made into a warning so it does not spam HA logs but everything else should still break since this is a reverse engineered project and has not been made on top of an official api. There are many possible responses and I think that Google has different services depending on your geo location so I would like to catch everything in time and cover all cases, and masking errors will make it much more difficult.

costastf commented 6 years ago

And a quick comment on the retrieval part. The code for the retrieval part has not been changed since the first version of the library. The only thing being changed is the evolution of the handling of the authentication. Seeing as the errors are 500 and as their message clearly state they don't know what is wrong I would argue that there is some degradation of service from Google's side. Maybe it would be beneficial if people with update issues and 500 errors shared their general location so we can correlate.

ve6rah commented 6 years ago

If it helps, I'm in Calgary, Canada. I understand this is a reverse engineered module, I'm wondering if the reverse engineering needs to be revisited. Perhaps over time Google has changed how they do things, and theses queries no longer line up well enough with the official ones. It seems unlikely that the official ones are getting this error that frequently. (I know Google made some changes recently as they now report battery level along with location, who knows what else they may have changed)

Another behavioral quirk I've noticed though that may be a hint on the delays; When I open maps, it will often show a location that is about 10-20 minutes old, and then perform a refresh to get a current location. I can't help but think we're missing that refresh, and only showing the original version. That said, this still would only account for about 20 minutes of staleness, not over an hour (as far as I can tell)

pnbruckner commented 6 years ago

I also understand this is a reverse engineered project. (I'm working on one just like that for Life360.) And FWIW, I played around quite a bit with how to minimize the number of error messages. After trying various approaches, I settled on making the messages warnings, unless they happen four times in a row, in which case they would be an error. In my mind, unless they happen multiple times in a row, they're not significantly delaying updates, since between the errors the queries are still working. But if they happen four times in a row (and since HA calls the method every 30 seconds), then they're delaying the update by two minutes. So, if they don't happen consecutively, then make them a warning. If a user wants to see them all, then just set the level to warning or below. But if they happen enough times in a row, then they're causing a real problem so they should be an error.

Whether or not you ultimately decide to do something like this, that's up to you. But I find it, at least for now, much less annoying.

costastf commented 6 years ago

@ve6rah although conceptually it could make sense that there are changes I don't think that that is the case now since there is no pattern on the error messages and they are quite far apart for any rate limiting to take place. I am off course look into any possible changes after my vacation.

costastf commented 6 years ago

v3.0.2 turns broken responses from the server to log warnings. Lets take it from there. If this requires more attention lets open the issue again.

costastf / locationsharinglib

Excessive amount of error 500 #37