Open PatGendre opened 4 years ago
Hm, this is super interesting and weird. First, some comments about the architecture:
Second, if the user has already logged in, but the user mapping in the uuid_db
is removed, then we don't currently create a new user. Instead, the getUUID(request)
call - e.g. in
https://github.com/e-mission/e-mission-server/blob/e05cf31486f74342d2749a672c27cdd3b75090e3/emission/net/api/cfc_webapp.py#L238
will return None
and the call will fail. This is not specific to the OpenID
auth case; it happens for any auth method if the underlying DB is deleted.
A workaround should be to logout and login again. As Yann points out, logging out will not delete any of the store information, logging in will restore the entry to the uuid db and then all the accumulated data should slowly make its way to the server.
It should also be possible to change /usercache/put
so that it creates a user if one does not exist. However, given that this is a corner case (underlying values deleted from database), it seems like requiring a login before accepting data is a reasonable restriction.
Thanks for the explanation, I understand why the app should even if no connection is available.
A workaround should be to logout and login again. Unfortunately the user has tried to logout and login again and it seems it didn't work. Do you see why it could be? Anyhow I have ask her to try again, I'll tell you if this works... Otherwise she will create new logins and lose the accumulated tracks, it is not so big a problem: more annoying is the fact that she lost time with an app not working as it should.
Unfortunately the user has tried to logout and login again and it seems it didn't work. Do you see why it could be?
I can't figure that out without logs. Some potential investigation avenues are:
/usercache/put
call that errors out with a None response from getUUID(request)
/profile/create
and /profile/update
calls from around the time that the user tried to logout and login
Ok thank you!
UPDATE : I now see the test user in the webserver logs, so she has tried again to logout and login, and this has worked :-))
@shankari I was too optimistic :
I see several lines like this in the logs
2019-09-20 18:32:21,386:DEBUG:139931868665600:decoded_json = {'jti': 'bfa0795a-a78b-47db-b717-f3ccd203065c', 'exp': 1568997434, 'nbf': 0, 'iat': 1568997134, 'iss': 'https://id.indie.host/auth/realms/fabmob-io', 'aud': 'traceur', 'sub': 'f3abc6a4-9e7d-4156-970a-21a8f362df03', 'typ': 'ID', 'azp': 'traceur', 'auth_time': 1563884632, 'session_state': 'fa4cd62b-b0a6-45d2-a70d-fc1de4e441ac', 'acr': '1', 'email_verified': False, 'name': 'Cerema Teltest05', 'preferred_username': 'teltest05', 'given_name': 'Cerema', 'family_name': 'Teltest05', 'email': 'teltestcerema05@gmail.com'} 2019-09-20 18:32:21,386:DEBUG:139931868665600:After verifying claim, returning valid email = teltestcerema05@gmail.com 2019-09-20 00:21:38,357:DEBUG:139931877058304:retUUID = None 2019-09-20 00:21:38,357:DEBUG:139931877058304:END POST /usercache/put None 1.2417240142822266
So I can suppose that she did NOT logout ?
And actually I don't see any profile/creata
Yes, I don't think she logged out. This log snippet indicates that the token was decoded correctly and mapped to the teltestcerema05@gmail.com
email address. But then that email address was not mapped to a UUID (retUUID = None
).
Are there entries for teltestcerema05@gmail.com
in the uuid db?
Also, do you see any recent calls to /usercache/put
or /usercache/get
?
Are there entries for teltestcerema05@gmail.com in the uuid db?
No
do you see any recent calls to /usercache/put or /usercache/get?
no, not really. But I noticed that another user had a None retuuid :-( So I contacted him and asked him to logout/login...
no, not really. But I noticed that another user had a None retuuid :-( So I contacted him and asked him to logout/login...
So then she really hasn't logged out/logged in. Maybe she just killed the app? That doesn't really change anything.
Hi, I called the test user again this morning, we tested together both on android and ios. Reinstalling the app (or updating it on iOS), deleting all cache+user data on android both in the OS parameters and in the app in the Profile, but when she reconnects it still does not work. Her actions are not logged by the webserver, no new user is created in mongodb. We tested that her credentials are correct on the openid auth web client (in firefox). So the error is mysterious for me, I suppose we need to log more info to understand what happens, I will see with Yann what we can do.
Reinstalling the app (or updating it on iOS), deleting all cache+user data on android both in the OS parameters and in the app in the Profile, but when she reconnects it still does not work.
Wait, so you uninstalled and reinstalled the app and there are still no connections to the webserver? That seems wrong. Can you check which host she is connected to? (Open Developer Zone and scroll to the bottom)
Good idea, thanks, I will ask her to check that. It must come from the phone conf because if we try to connect from another phone, i works (and creates a new user in mongodb).
@shankari she verified that the server URL in the profile was correct. She connected again this morning but still there are no logs in webserver.log I attached her logs (there is a first connection by me on my phone which created the user on oct 1, then her connection on oct 2 morning).
It seems that the connection is Ok? However I remark that the time of readIntroDone 08:37 am with the log time 23:56-7 = 06:56 am. Also, could the fact that I first logged on from another phone (and then logged out) cause problems?
429,1569999408.84,2019-10-01T23:56:48.840000-07:00,"js : in readIntroDone, read_val = ""2019-10-02T08:37:08+02:00"""
430,1569999408.841,2019-10-01T23:56:48.841000-07:00,"js : currChannel == null, skipping deploy init"
431,1569999408.843,2019-10-01T23:56:48.843000-07:00,"js : in isIntroDone, returning true"
432,1569999408.8439999,2019-10-01T23:56:48.844000-07:00,js : deploy init complete
433,1569999408.863,2019-10-01T23:56:48.863000-07:00,"js : connectionConfigString = {""connectUrl"":""https://fabmob.grfmap.com:8080"",""android"":{""auth"":{""method"":""openid-authutil"",""discoveryURI"":""https://id.indie.host/auth/realms/fabmob-io/"",""clientID"":""traceur"",""scope"":""openid email profile offline_access""}},""ios"":{""auth"":{""method"":""openid-authutil"",""discoveryURI"":""https://id.indie.host/auth/realms/fabmob-io/"",""clientID"":""traceur""}}}"
434,1569999408.8660002,2019-10-01T23:56:48.866000-07:00,BuiltinUserCache : Added value for key connection_settings at time 1.569999408866E9
435,1569999408.877,2019-10-01T23:56:48.877000-07:00,"js : required consent version = {""protocol_id"":""2014-04-6267"",""approval_date"":""2016-07-14""}"
I attach the whole log in case it helps.
What can we do to log something on the server side? I suppose we could add logging instructions in openid_auth.py ? and then test again with her?
What can we do to log something on the server side? I suppose we could add logging instructions in openid_auth.py ? and then test again with her?
If you were getting calls to the server that were failing authentication, that would make sense. But in this case, it looks like the calls didn't even make it to the server?
Her actions are not logged by the webserver ...
In that case, logging the auth won't help
Checking the logs, the authentication is working fine
99,1569998046.37,2019-10-01T23:34:06.370000-07:00,OpenIDAuth : id token retrieved: ...
164,1569998114.516,2019-10-01T23:35:14.516000-07:00,JWTAuthPlugin : requestCode = 100 resultCode = -1
165,1569998114.518,2019-10-01T23:35:14.518000-07:00,OpenIDAuth : onActivityResult get called
At least in these logs, the user did not create a profile, so this is after installation was complete.
$ grep "profile.create" loggerDB_tel03.withdate.log | wc -l
0
Looking at the logs further, I see this error show up repeatedly
325,1569998310.283,2019-10-01T23:38:30.283000-07:00,ServerSyncAdapter : IO Error java.net.SocketTimeoutException: SSL handshake timed out while posting converted trips to JSON
355,1569998370.456,2019-10-01T23:39:30.456000-07:00,ServerSyncAdapter : IO Error java.net.SocketTimeoutException: SSL handshake timed out while posting converted trips to JSON
There is apparently some network error that is causing her connection to time out? Can she access the web page (https://fabmob.grfmap.com:8080) from her browser (e.g. Chrome)? Does it load properly?
There is a later error where it looks like the SSL handshake started, but was then aborted.
399,1569998490.976,2019-10-01T23:41:30.976000-07:00,"ServerSyncAdapter : IO Error javax.net.ssl.SSLHandshakeException: SSL handshake aborted: ssl=0x79240b6d88: I/O error during system call, Connection reset by peer while posting converted trips to JSON"
What device is the user testing with? There is some evidence that older android devices may have SSL connectivity errors, although I have tested on android 4.4 (and I think you have as well) and it worked. https://github.com/mapbox/mapbox-gl-native/issues/9640
Can the user retry when they are connected to WiFi? Some people are reporting such failures when connected to 4G.
Hello, i have the problem but only when i’m on 4G. on wifi i can access any https:// website, on 4g it’s random, sometimes yes, mor often i can’t. do you ever had this problematic?
Read more at: https://www.thesslstore.com/blog/fix-ssl-connection-errors-android-phones/
Hi @shankari thanks for your help
Can she access the web page (https://fabmob.grfmap.com:8080) from her browser (e.g. Chrome)? Does it load properly?
I will ask her
What device is the user testing with? There is some evidence that older android devices may have SSL connectivity errors
I will ask her too, but the test phone are certainly new phones.
Can the user retry when they are connected to WiFi?
I will ask her this too because when I called her last time she was at her office and there is no wifi so she was using 4G. Moreover I will suggest her to test the advice given in the sslstore blogpost, because at least for the first one asking to check if the phone clock is on time, I have a suspicion : as I told you above in the logs, the time of readIntroDone 08:37 am is not consistent with with the log time 23:56-7 = 06:56 of the readintrodone action. But that would weird if this timing problem occurred on all her test phones! I will tell you in the next episode of the story...
I have a first answer : she cannot access the web page (https://fabmob.grfmap.com:8080) from her browser)? So this a confirm an SSL issue. She will test the countermeasures from the SSL Store blogpost, this evening, hopefully one will work.
One intriguing point is that the the SSL issue seems to be an Android problem, and here we have also problem with Iphones here.
Besides, she only in possession of 5 from the 9 phones, so she will her colleagues to check this out, starting with log out / and log in again. I hope we will have solved this by next monday!
@PatGendre we should check the ios logs to see if there is a similar error. You can do it, or if you want me to take a look, send them to me and I will post the results
@shankari yes I asked to test on Iphone too, she should do it this week-end, I will tell you if I have the logs.
@PatGendre any updates? One other test I thought about was for her to try to access other Let's Encrypt sites. For example, my personal server https://damajash.org uses Let's Encrypt but with an apache frontend redirecting to the subdomains. If she is able to successfully access it using her browser, we have a potential idea of how to fix this (i.e. use an apache-based reverse proxy).
@shankari sorry, no, for some reason the user could no test last week, in principle she will this week, i will keep informed of course. Thanks for the possible solution!
Note that I am not (yet?!) running an instance of e-mission on damajash. So she would have to test it in a browser only 😃
@PatGendre any updates on this? I also started an instance of e-mission on damajash. So the tester can try accessing https://tripaware.damajash.org, or use the full app from https://e-mission.eecs.berkeley.edu/#/client_setup?new_client=urap2017emotion&clear_usercache=true&clear_local_storage=true
This version does run the most recent version of the pipeline.
@shankari no, sorry, no news, I hope this week...
Hi @shankari : good news, she tested today and the tripaware app seems to work Ok on the 4 phones where she installed it :-) Thank you !! I suppose you see the 4 users working all right on the server, and understand this confirms your diagnostic, that the issue was due to an SSL problem and that it can be solved by setting up a reverse proxy ?
If yes, I will see with Yann how he can set up this RP. It could be with Apache, or Nginx, do you have any recommendations? Also, I don't understand why the issue happened only with these particular phones? What could be the cause, a different phone operator? (because there are not exotic models, and the phone parameter config is standard, too, I guess). Moreover, there are 2 android and 2 Iphones, so this would not be an android issue??
Yup! I see users Cerema06, Cerema07, Cerema08 and Cerema04 on the tripaware server. I use an apache RP, similar to what the lab at Berkeley uses to redirect to the systems that I have set up there so far.
The lab at Berkeley doesn't use letsencrypt, of course :) I use the default letsencrypt plugin for an Apache server outside the e-mission container and then use the virtual host option to redirect traffic from the proxy to the e-mission container.
I'm happy to post more detailed instructions if you need them
Also, I don't understand why the issue happened only with these particular phones? What could be the cause, a different phone operator? (because there are not exotic models, and the phone parameter config is standard, too, I guess).
Maybe. Note that when others reported the issue, they encountered it only on 4G and not on WiFi. (https://github.com/e-mission/e-mission-docs/issues/459#issuecomment-537591068) You could ask her to also test with your server when connected to WiFi to debug further.
Thanks @shankari
I'm happy to post more detailed instructions if you need them
Please if you can give us more details (such as config files copy and paste lines...), thanks in advance, because Yann has only a basic knowledge of apache (which is a quite big software).
You could ask her to also test with your server when connected to WiFi to debug further.
We did, I'll keep you informed. And also asked her to ask her IT team which exact usage limitation come with the Pro mobile plan that they have in their organisation (Cerema), this may helps us understand better what is blocking currently. Most of the other testers have tested the app with their own phone, so with a consumer plan, and did not encounter this issue.
@shankari FYI the app works when connected wia Wifi , which is good news because it makes it a minimum usable :-) (The users have been created in our mongodb, at last :-) And we still have to solve the HTTPS access issue, either by configuration a RP or by changing the phone access limitations that seem to come with their cellular network Pro plan.
@shankari FYI Yann has set up an Apache https reverse proxy yesterday (it was not so difficult, after a few trials and errors :-). We have asked the user to test the access to https://fabmob.grfmap.com:8080/#/home via the cellular network. But still, it seems to work not so well on wifi, one of the 3 phones she installed has no corresponding user in the e-mission db so no connection with the server... I hope the further tests give more info...
Again, can you explain why the e-mission could not be accessed with certain phones, and why the reverse proxy could solve this? It is really unclear for Yann and I. Our understanding that it could be due to the fact that the phone with the issues run with certain cellular operators who have some restrictions on the https protocol, and that switching from bottle to apache httpd would raise those restrictions... is it correct? If yes, it means that all phone with the same operator should not work, either? Than we would have to ask the current users their operators to check that they are with another operator. In France we have 4 cellular operators, the one for the test phones having this issue has ca. 10% market share.
@shankari it seems the RP solved the issue! I will confirm this in a few days when all 9 test phones are running :-) And still we have this question of understanding the issue was raised only with these particular phones and if this could be caused by this particular cellular operator?
@PatGendre I think it would be useful to fill out a table with the success and failure conditions
on 3G with Pro operator | on 3G with other operator | on WiFi | |
---|---|---|---|
with RP | |||
without RP |
Right now, it looks like:
I would also like to understand what the failure case is. But regardless, it seems like we should have a document outlining that using an RP is a best practice because apparently sometimes, the SSL connection gets dropped otherwise.
I was going to try to write such a document outlining the steps for configuring the apache RP but since I am a bit overwhelmed now, maybe Yann can write that document and submit it.
If indeed this is an issue with 3G only, there is a similar report https://android.stackexchange.com/questions/51852/connection-to-home-server-via-https-works-with-wifi-but-not-with-3g but no solution.
You could also get a SIM card for that plan with a small amount of money loaded and test more rigorously without having to bother the user.
For example, SIM card on that network + your phone - does it still work?
RP-conf.sample.zip @shankari thanks for having found the RP solution!
I would modify slightly the table. According to the tester, there is no difference between the pro and general public plan for this mobile operator (Bouygtel) which has e-mission connection problem. So I suppose other users with the operator would have the same issue. I will ask the French users whether some of them have the same other issues or can to try access the web site without RP (which is https://fabmob.grfmap.com:8081/).
on 3G/4G with some operators | on 3G/4G with most operators | on WiFi | |
---|---|---|---|
with RP | ok | ok | ok |
without RP | no TLS access | ok | ok |
I attach a sample apache conf file for a RP. In our case the original port for the bottle.py e-mission web server was 8080. We added a reverse proxy on apache2 httpd listening on 8080, and passing the requests to the e-mission server now on port 8081. Besides adding a virtual host in the httpd.conf and changing the port number in the e-mission conf/net/api/webserver.conf file to 8081, it is necessary to install the mod_proxy and mod_ssl extension and configure SSL (with letsencryt, e.g.).
Links to sites with explanation are for example:
https://www.digitalocean.com/community/tutorials/how-to-use-apache-as-a-reverse-proxy-with-mod_proxy-on-ubuntu-16-04
https://admin-ahead.com/forum/general-linux/configure-to-listen-on-multiple-ports-apache/ https://httpd.apache.org/docs/2.4/en/vhosts/examples.html
https://www.vincentliefooghe.net/content/apache-reverse-proxy-https (in French)
Is it clear enough? Tell me if I should add this text somewhere in the doc.
Tell me if I should add this text somewhere in the doc.
Yes, I was thinking that you could add this somewhere to the doc and submit a PR. That doc should also have the information in https://github.com/e-mission/e-mission-docs/pull/47
Ok, I will do that tomorrow
This is a minor issue on the Android device. This could be because of device date and time is not correct, the network connection is not secure, antivirus is conflicting with the browser, etc. And, this error can be fixed easily. Here is a guide that can provide potential solutions: Fix SSL Connection Error On Android
@shankari we have a user doing a benchmark of 5 mobility tracking apps : itinerum, travelvu, motiontag, woorti, and e-mission/fabmob. She has 9 phones (2 Iphone, 7 android).
She called me this week because the app was very slow and in practice so slow it was not usable. After her call, I realised that she started the tests but in early september, but I did notice it because no users were added on the server side in mongo (and there is no sign of activity in the server logs for these users). On several if not all of her phones, the app is launched since 2 weeks and has accumulated data locally but is actually not connected to the server.
What seem to be the explanation is that she actually created the users in our openid keycloak server in July, and installed the apps on the phones also in July, just a few days before our mongo db was dropped by an indonesian attacker. Loic restored previous data but we lost 15 days, including the 9 test users.
As I said, in the webserver logs we don't see the test users, and we don't find it either in the uuid collection. In the app logs we have this error:
ServerSyncAdapter : IO Error java.net.SocketTimeoutException: SSL handshake timed out while posting converted trips to JSON
I cannot find this particular error in the server sync plugin code, what does it mean?The problem may be in the openid auth code, either on the server or on the app side ? In our case here : when the user opens the app, accepts the terms of use and logs in, the openid will validate the credentials, but then there is no user with this e-mail in mongo. So the correct behavior should then be to create a new e-mission user, but apparently this is not what happens. No user is added, but the user is allowed to use the app although he/she is not connected.
Thanks in advance when you have time to investigate this.