OneBusAway / onebusaway-iphone

OBA development has moved!
https://github.com/OneBusAway/OBAKit
Apache License 2.0
219 stars 117 forks source link

Small number of users occasionally lose ability to connect to API #14

Closed maxgano closed 8 years ago

maxgano commented 11 years ago

We have a pattern of a few users simply losing the ability to connect to OBA following a data bundle update. For any given update, we hear from a half dozen or more users. This also may be occurring in relationship to other changes such as DNS updates, etc. But the most clear pattern to date is emerging in relationship to data bundle updates, after which we consistently hear from a small number of users who can't connect.

We provide the users with three possible actions to correct:

It's a bit difficult to get followup from what eventually works for users, and we are attempting to gather better stats. At this point, it seems that it is fairly evenly spread across the three actions listed above.

I am entering this bug to both initiate an investigation of root causes if practical, and to also see if other OBA Operators are experiencing similar user impact.

For background, we utilize a rolling update to a pair of TDS Servers operating through a load balancer for fault tolerance. To update data bundles, the first server is disabled, updated, tested then re-enabled. Then we repeat the process for the second server. This works great, allowing four nines availability with no effective downtime for users other than the few mentioned above. Validation conducted directly against API has never resulted in failure to connect to the TDS server throgh the API, so this seems to be something to do with some condition that the native app is experiencing but not coping with consistently. We have also not had similar reports from users of either Android or Windows phones. Or at least not that I have noticed.

Happy to provide any further information / support needed. It's a gnarly one, probs, but definitely helpful to users going forward.

maxgano commented 11 years ago

BTW: Scott Rose has a long history of experience with this issue. He can both validate (or correct me) on the data bundle update scenario, and provide additional background of other changes that may also be triggering this same behavior.

barbeau commented 11 years ago

@maxgano I don't think we've heard complaints about this from iPhone users in Tampa yet, but we're still in pilot phase with only around 400 users total, and iPhone penetration among smartphone users is only around 27% (i.e., about 62% are Android users). I'm checking with Candace Brakewood from GA Tech to see if any Tampa users reported this issue to her after our updates, since she was the first line of customer service defense during our study.

iPhone app is current hard-coded to URL api.onebusaway.org: https://github.com/OneBusAway/onebusaway-iphone/blob/a868d6f2ca7d72252acb268f5d8dc98581d92807/Classes/OBAApplicationContext.m#L50

This will obviously change once multiregion is implemented (https://github.com/OneBusAway/onebusaway-iphone/issues/11), but all regions are using similar URL structures (e.g., api.tampa.onebusaway.org), so I would imagine this will affect everyone, given that the DNS is all managed through Puget Sound.

With the caveat that I'm not an iPhone developer, off the top of my head I'd say its probably related to how iPhone handles DNS.

Here's a post that outlines some of the iPhone DNS issues (I'm not sure of date of article): http://www.saurik.com/id/3

It does include a possible solution that can be implemented in the iPhone app:

The Solution It turns out that, in fact, Apple's resolver library has a serious bug in it. However, it is one that we can work around using a small amount of code added to the very beginning of main in any program that needs to use networking functions. This fix is almost automatable, universally applicable, and trivial to apply. #include <mach-o/nlist.h> ... int main() { struct nlist nl[2]; memset(nl, 0, sizeof(nl)); nl[0].n_un.n_name = (char *) "_useMDNSResponder"; nlist("/usr/lib/libc.dylib", nl); if (nl[0].n_type != N_UNDF) * (int *) nl[0].n_value = 0; Unfortunately, Apple is not very good about making their headers safe from C++ name mangling. If you attempt this fix with a C++ project and get a link error for nlist(char const, nlist), you should make the following change: extern "C" { #include <mach-o/nlist.h>

Article also mentions why this occurs on some DNS host names and not others:

an iPhone hacker named core managed to provide some clarity to this mess and isolated what fails: CNAMEs (DNS records which do not directly have an address, but instead refer to other records) don't work, everything else does. iphone:~ root# host -t a yahoo.com yahoo.com has address 216.109.112.135 yahoo.com has address 66.94.234.13 iphone:~ root# host -t a www.yahoo.com www.yahoo.com is an alias for www.yahoo-ht3.akadns.net. www.yahoo-ht3.akadns.net has address 69.147.114.210

I don't know enough about the DNS behind the scenes for OBA to know if this could possibly lead to a server-side solution too.

maxgano commented 11 years ago

Yup, like I thought, gnarly :)

But it looks like it's fixable too. Alternative definition of gnarly = cool !!!

Thx for the quick follow up, Sean.

Sent from my iPhone

On Jun 26, 2013, at 9:54 AM, "Sean Barbeau" notifications@github.com<mailto:notifications@github.com> wrote:

@maxganohttps://github.com/maxgano I don't think we've heard complaints about this from iPhone users in Tampa yet, but we're still in pilot phase with only around 400 users total, and iPhone penetration among smartphone users is only around 27% (i.e., about 62% are Android users). I'm checking with Candace Brakewood from GA Tech to see if any Tampa users reported this issue to her after our updates, since she was the first line of customer service defense during our study.

iPhone app is current hard-coded to URL api.onebusaway.orghttp://api.onebusaway.org: https://github.com/OneBusAway/onebusaway-iphone/blob/a868d6f2ca7d72252acb268f5d8dc98581d92807/Classes/OBAApplicationContext.m#L50

This will obviously change once multiregion is implemented (#11https://github.com/OneBusAway/onebusaway-iphone/issues/11), but all regions are using similar URL structures (e.g., api.tampa.onebusaway.orghttp://api.tampa.onebusaway.org), so I would imagine this will affect everyone, given that the DNS is all managed through Puget Sound.

With the caveat that I'm not an iPhone developer, off the top of my head I'd say its probably related to how iPhone handles DNS.

Here's a post that outlines some of the iPhone DNS issues (I'm not sure of date of article): http://www.saurik.com/id/3

It does include a possible solution that can be implemented in the iPhone app:

The Solution It turns out that, in fact, Apple's resolver library has a serious bug in it. However, it is one that we can work around using a small amount of code added to the very beginning of main in any program that needs to use networking functions. This fix is almost automatable, universally applicable, and trivial to apply.

include <mach-o/nlist.h>

... int main() { struct nlist nl[2]; memset(nl, 0, sizeof(nl)); nl[0].n_un.n_name = (char *) "_useMDNSResponder"; nlist("/usr/lib/libc.dylib", nl); if (nl[0].n_type != N_UNDF)

Article also mentions why this occurs on some DNS host names and not others:

an iPhone hacker named core managed to provide some clarity to this mess and isolated what fails: CNAMEs (DNS records which do not directly have an address, but instead refer to other records) don't work, everything else does. iphone:~ root# host -t a yahoo.comhttp://yahoo.com yahoo.comhttp://yahoo.com has address 216.109.112.135 yahoo.comhttp://yahoo.com has address 66.94.234.13 iphone:~ root# host -t a www.yahoo.comhttp://www.yahoo.com www.yahoo.comhttp://www.yahoo.com is an alias for www.yahoo-ht3.akadns.nethttp://www.yahoo-ht3.akadns.net. www.yahoo-ht3.akadns.nethttp://www.yahoo-ht3.akadns.net has address 69.147.114.210

I don't know enough about the DNS behind the scenes for OBA to know if this could possibly lead to a server-side solution too.

Reply to this email directly or view it on GitHubhttps://github.com/OneBusAway/onebusaway-iphone/issues/14#issuecomment-20062926.

smrose commented 11 years ago

Sean says

iPhone app is current hard-coded to URL api.onebusaway.org

which implies that the URL of the API server is fixed. In fact, that's just the default value, and there is UI for changing it. Maybe everybody understood that already, but the way it was worded could possibly mislead somebody.

As for the association between the changing of a data bundle and the iPhone "cannot connect" issue, I've never made that association and I still don't make that association. User reports seem to be more or less uniformly distributed over time.

As for the issue with CNAMEs, it might be worth noting that api.onebusaway.org nor api.pugetsound.onebusaway.org are CNAME records any longer- since the transition to Sound Transit several weeks ago- yet the issue seems to have endured.

Aren't I just full of good news today?

On Wed, Jun 26, 2013 at 9:53 AM, Sean Barbeau notifications@github.comwrote:

@maxgano https://github.com/maxgano I don't think we've heard complaints about this from iPhone users in Tampa yet, but we're still in pilot phase with only around 400 users total, and iPhone penetration among smartphone users is only around 27% (i.e., about 62% are Android users). I'm checking with Candace Brakewood from GA Tech to see if any Tampa users reported this issue to her after our updates, since she was the first line of customer service defense during our study.

iPhone app is current hard-coded to URL api.onebusaway.org:

https://github.com/OneBusAway/onebusaway-iphone/blob/a868d6f2ca7d72252acb268f5d8dc98581d92807/Classes/OBAApplicationContext.m#L50

This will obviously change once multiregion is implemented (#11https://github.com/OneBusAway/onebusaway-iphone/issues/11), but all regions are using similar URL structures (e.g., api.tampa.onebusaway.org), so I would imagine this will affect everyone, given that the DNS is all managed through Puget Sound.

With the caveat that I'm not an iPhone developer, off the top of my head I'd say its probably related to how iPhone handles DNS.

Here's a post that outlines some of the iPhone DNS issues (I'm not sure of date of article): http://www.saurik.com/id/3

It does include a possible solution that can be implemented in the iPhone app:

The Solution It turns out that, in fact, Apple's resolver library has a serious bug in it. However, it is one that we can work around using a small amount of code added to the very beginning of main in any program that needs to use networking functions. This fix is almost automatable, universally applicable, and trivial to apply.

include <mach-o/nlist.h>

... int main() { struct nlist nl[2]; memset(nl, 0, sizeof(nl)); nl[0].n_un.n_name = (char *) "_useMDNSResponder"; nlist("/usr/lib/libc.dylib", nl); if (nl[0].n_type != N_UNDF)

  • (int _) nl[0].nvalue = 0; Unfortunately, Apple is not very good about making their headers safe from C++ name mangling. If you attempt this fix with a C++ project and get a link error for nlist(char const, nlist*), you should make the following change: extern "C" {

    include <mach-o/nlist.h>

Article also mentions why this occurs on some DNS host names and not others:

an iPhone hacker named core managed to provide some clarity to this mess and isolated what fails: CNAMEs (DNS records which do not directly have an address, but instead refer to other records) don't work, everything else does. iphone:~ root# host -t a yahoo.com yahoo.com has address 216.109.112.135 yahoo.com has address 66.94.234.13 iphone:~ root# host -t a www.yahoo.com www.yahoo.com is an alias for www.yahoo-ht3.akadns.net. www.yahoo-ht3.akadns.net has address 69.147.114.210

I don't know enough about the DNS behind the scenes for OBA to know if this could possibly lead to a server-side solution too.

— Reply to this email directly or view it on GitHubhttps://github.com/OneBusAway/onebusaway-iphone/issues/14#issuecomment-20062926 .

S. Morris Rose technical staff University of Washington CSE

maxgano commented 11 years ago

I will defer to Scott in regards to the causal relationship. My observation is anecdotal and I have only recently been tracking customer feedback. Perhaps I'm just watching more intently when we do a data bundle update. But still … time will tell. We're getting some tools in place to track customer feedback and that will give us stats we can use for better analysis.

I'm a data guy after all, right? Where's my data ???

Max Gano, Solution Architect Sound Transit - Research and Technology mobile: 206-618-2606 Skype: max.gano

Theory: Information self-organizes to remain memorable and relevant

On Jun 27, 2013, at 10:11 AM, "S. Morris Rose" notifications@github.com<mailto:notifications@github.com> wrote:

Sean says

iPhone app is current hard-coded to URL api.onebusaway.orghttp://api.onebusaway.org

which implies that the URL of the API server is fixed. In fact, that's just the default value, and there is UI for changing it. Maybe everybody understood that already, but the way it was worded could possibly mislead somebody.

As for the association between the changing of a data bundle and the iPhone "cannot connect" issue, I've never made that association and I still don't make that association. User reports seem to be more or less uniformly distributed over time.

As for the issue with CNAMEs, it might be worth noting that api.onebusaway.orghttp://api.onebusaway.org nor api.pugetsound.onebusaway.orghttp://api.pugetsound.onebusaway.org are CNAME records any longer- since the transition to Sound Transit several weeks ago- yet the issue seems to have endured.

Aren't I just full of good news today?

On Wed, Jun 26, 2013 at 9:53 AM, Sean Barbeau notifications@github.com<mailto:notifications@github.com>wrote:

@maxgano https://github.com/maxgano I don't think we've heard complaints about this from iPhone users in Tampa yet, but we're still in pilot phase with only around 400 users total, and iPhone penetration among smartphone users is only around 27% (i.e., about 62% are Android users). I'm checking with Candace Brakewood from GA Tech to see if any Tampa users reported this issue to her after our updates, since she was the first line of customer service defense during our study.

iPhone app is current hard-coded to URL api.onebusaway.orghttp://api.onebusaway.org:

https://github.com/OneBusAway/onebusaway-iphone/blob/a868d6f2ca7d72252acb268f5d8dc98581d92807/Classes/OBAApplicationContext.m#L50

This will obviously change once multiregion is implemented (#11https://github.com/OneBusAway/onebusaway-iphone/issues/11), but all regions are using similar URL structures (e.g., api.tampa.onebusaway.orghttp://api.tampa.onebusaway.org), so I would imagine this will affect everyone, given that the DNS is all managed through Puget Sound.

With the caveat that I'm not an iPhone developer, off the top of my head I'd say its probably related to how iPhone handles DNS.

Here's a post that outlines some of the iPhone DNS issues (I'm not sure of date of article): http://www.saurik.com/id/3

It does include a possible solution that can be implemented in the iPhone app:

The Solution It turns out that, in fact, Apple's resolver library has a serious bug in it. However, it is one that we can work around using a small amount of code added to the very beginning of main in any program that needs to use networking functions. This fix is almost automatable, universally applicable, and trivial to apply.

include <mach-o/nlist.h>

... int main() { struct nlist nl[2]; memset(nl, 0, sizeof(nl)); nl[0].n_un.n_name = (char *) "_useMDNSResponder"; nlist("/usr/lib/libc.dylib", nl); if (nl[0].n_type != N_UNDF)

  • (int _) nl[0].nvalue = 0; Unfortunately, Apple is not very good about making their headers safe from C++ name mangling. If you attempt this fix with a C++ project and get a link error for nlist(char const, nlist*), you should make the following change: extern "C" {

    include <mach-o/nlist.h>

Article also mentions why this occurs on some DNS host names and not others:

an iPhone hacker named core managed to provide some clarity to this mess and isolated what fails: CNAMEs (DNS records which do not directly have an address, but instead refer to other records) don't work, everything else does. iphone:~ root# host -t a yahoo.comhttp://yahoo.com yahoo.comhttp://yahoo.com has address 216.109.112.135 yahoo.comhttp://yahoo.com has address 66.94.234.13 iphone:~ root# host -t a www.yahoo.comhttp://www.yahoo.com www.yahoo.comhttp://www.yahoo.com is an alias for www.yahoo-ht3.akadns.nethttp://www.yahoo-ht3.akadns.net. www.yahoo-ht3.akadns.nethttp://www.yahoo-ht3.akadns.net has address 69.147.114.210

I don't know enough about the DNS behind the scenes for OBA to know if this could possibly lead to a server-side solution too.

— Reply to this email directly or view it on GitHubhttps://github.com/OneBusAway/onebusaway-iphone/issues/14#issuecomment-20062926 .

S. Morris Rose technical staff University of Washington CSE

— Reply to this email directly or view it on GitHubhttps://github.com/OneBusAway/onebusaway-iphone/issues/14#issuecomment-20140036.

barbeau commented 11 years ago

I'm changing the title of this issue slightly to better reflect our lack of knowledge on precise causal relationships.

So are these the same set of users repeatedly reporting this issue from time to time? Or does it seem to affect the community in general, without necessarily repeating with the same users?

If its the same set of users repeatedly encountering the issue, it would be good to get exact platform/device info from them so we can see if its specific to certain iPhone/iPad/iPod versions.

smrose commented 11 years ago

I get a steady stream of users reporting the issue, including myself. I've never noticed the same user report it multiple times. In my case, it was a relatively elderly 3S phone on which I encountered it. According to Brian, the issue has been present since the app was released.

I'm not doing email support for Puget Sound any longer, though I'm still on the mailing list for the purpose of monitoring the traffic for posts that are relevant to the project. I could ask that those continuing to provide email support seek platform/OS information from users that report the issue. I suspect it's a dead end, but we gotta get a handle on this somehow, and starting with assumptions won't be helpful.

On Fri, Jun 28, 2013 at 6:33 AM, Sean Barbeau notifications@github.comwrote:

I'm changing the title of this issue slightly to better reflect our lack of knowledge on precise causal relationships.

So are these the same set of users repeatedly reporting this issue from time to time? Or does it seem to affect the community in general, without necessarily repeating with the same users?

If its the same set of users repeatedly encountering the issue, it would be good to get exact platform/device info from them so we can see if its specific to certain iPhone/iPad/iPod versions.

— Reply to this email directly or view it on GitHubhttps://github.com/OneBusAway/onebusaway-iphone/issues/14#issuecomment-20188193 .

S. Morris Rose technical staff University of Washington CSE

maxgano commented 11 years ago

Agreed on the title change. I have been reviewing the recent emails and it does seem coincidental to data bundle changes rather than causal.

BTW: I happened to me yesterday morning for the first time. iPhone 5 running IOS 6.1.4 (10B350) on the AT&T network. App was already running and had successfully accessed the prior evening. Attempted to first restart the app, then the phone. Finally deleted the app and reinstalled. That cleared it.

As Scott noted, we have eliminated CNAMES and are now using simple DNS A Records to point from the sub-domain api.onebusaway.orghttp://api.onebusaway.org to the API services load balancer. Nothing fancy there. Same for the api.pugetsound.onebusaway.orghttp://api.pugetsound.onebusaway.org sub-domain.

SoethiOne idea would be to enhance the bits of the app code to increase capture of the connection status details. This could then be leveraged to output connection status to the user and could also be forwarded to us somehow if they choose to do so We would probs want to make that optional to the user for privacy purposes. And we would want to avoid anything that increases latency, of course.

Just thinking out loud.

Cheers Max

Max Gano, Solution Architect Sound Transit - Research and Technology mobile: 206-618-2606 Skype: max.gano

Theory: Information self-organizes to remain memorable and relevant

On Jun 28, 2013, at 7:03 AM, S. Morris Rose notifications@github.com<mailto:notifications@github.com> wrote:

I get a steady stream of users reporting the issue, including myself. I've never noticed the same user report it multiple times. In my case, it was a relatively elderly 3S phone on which I encountered it. According to Brian, the issue has been present since the app was released.

I'm not doing email support for Puget Sound any longer, though I'm still on the mailing list for the purpose of monitoring the traffic for posts that are relevant to the project. I could ask that those continuing to provide email support seek platform/OS information from users that report the issue. I suspect it's a dead end, but we gotta get a handle on this somehow, and starting with assumptions won't be helpful.

On Fri, Jun 28, 2013 at 6:33 AM, Sean Barbeau notifications@github.com<mailto:notifications@github.com>wrote:

I'm changing the title of this issue slightly to better reflect our lack of knowledge on precise causal relationships.

So are these the same set of users repeatedly reporting this issue from time to time? Or does it seem to affect the community in general, without necessarily repeating with the same users?

If its the same set of users repeatedly encountering the issue, it would be good to get exact platform/device info from them so we can see if its specific to certain iPhone/iPad/iPod versions.

— Reply to this email directly or view it on GitHubhttps://github.com/OneBusAway/onebusaway-iphone/issues/14#issuecomment-20188193 .

S. Morris Rose technical staff University of Washington CSE

— Reply to this email directly or view it on GitHubhttps://github.com/OneBusAway/onebusaway-iphone/issues/14#issuecomment-20189992.

barbeau commented 11 years ago

Tagging @cbrakewood so she can follow this issue, since she's been getting feedback from Tampa riders as part of our pilot deployment. From her: "I went back through the post-wave survey results and specifically looked at the comments from the iPhone app users. None of the iPhone users wrote-in comments about the problem that you described above, but six (6) of them did select "OneBusAway appeared to be down" in the question about problem experiences. So they may have experienced it in some way."

caitbonnar commented 11 years ago

Setting this issue to "ready" because I think we should still continue looking into this problem. Do not necessarily know if this will be able to be fixed by next release. Does anyone have any more reports of this happening since the last update?

bbodenmiller commented 11 years ago

Can someone post a screenshot of the exact error they receive? Additionally when you say data bundle update do you mean schedule data is updated on the OBA servers?

maxgano commented 11 years ago

Sorry, didn't capture a screen at the time. I've also just updated to new version of the app. And we haven't updated the data bundle since I opened the ticket. We have a schedule change late August so that will be the first time I would expect this happen again, assuming there's an actual causal relationship.

But if I remember correctly, the user experience is that "Error connecting" appears in the green menu bar at the top of the screen. I am simulating what this would look like with the new version of the app by setting the API url to an invalid string.

oba_error_connecting

barbeau commented 11 years ago

@maxgano I think something went wrong with your previous post and the issue was accidentally closed, so I'm reopening.

smrose commented 10 years ago

Can the Contact info in the app be updated? The e-mail link doesn't work for my iphone (I deleted the iphone e-mail app it was tooo buggy)

S. Morris Rose technical staff University of Washington CSE Building web sites for less than $6e8 since 1994

bbodenmiller commented 10 years ago

The email contact link is working for me. Can you explain how you deleted the iPhone mail app? Didn't even know that was possible.

aaronbrethorst commented 8 years ago

Given that we haven't seen any movement on this in 2 years (and that I've never heard about it), I'm going to go out on a limb and speculate it's no longer an issue. If we haven't heard anything more about this in 3 months (March 1, 2016), I'm going to close it.

barbeau commented 8 years ago

Agreed, I haven't heard anything in Tampa on this in quite some time.

aaronbrethorst commented 8 years ago

closing per my last comment.