e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

A problem with openid connection #479

Open PatGendre opened 4 years ago

PatGendre commented 4 years ago

Hi @shankari we have 2 new users who installed our fabmob app but they cannot access to the app. They created a fabmob account via our openid keycloak server, and this worked. But when they try to log in, they get rejected with "an error occurred" from the log in screen, and get forwarded back to the consent screen so they are unable to use the app (and to send the app logs/usercache). However they get from time to time a e-mission trip-end notification so may be the tracking works? They tried to connect from their app with the credentials of the new user, but they failed, too. They can access to the e-mission homepage from a browser. They are on Android, resp. with a OnePlus 5000 (android 9) and a Motorola moto G play (android 7.1.1), so it is not likely that this due do the device. In the apache2 reverse proxy logs, I see a 500 when they try to go the profile screen: grfmapfabmob:443 IPaddress - - [12/Dec/2019:18:51:18 +0100] "POST /profile/update HTTP/1.1" 500 5071 "-" "Traceur de mobilit\xe9 FabMob"

As a test Yann and I created 2 new users and it worked. Other (real) users have created recently an account and installed the app and it worked. Currently we don't have access to the keycloak logs (we leaved a few messages for the keycloak admin but he has not responded yet).

We don't find the cause so I create this issue... Thanks in advance if you have any clue!

shankari commented 4 years ago

However they get from time to time a e-mission trip-end notification so may be the tracking works?

Tracking is not related to login. Once the user has consented, tracking will start. The login only affects the ability for the data to be pushed to the server.

In the apache2 reverse proxy logs, I see a 500 when they try to go the profile screen:

This is almost certainly the error, after we log in, we call profile/update to set the push token https://github.com/e-mission/e-mission-phone/blob/11734545c7b9283beeb72c63899763091363ae98/www/js/splash/pushnotify.js#L71

Can you look at the e-mission console logs to see what the 500 error is?

PatGendre commented 4 years ago

@shankari thanks:-) I am not sure this is exactly the same 500 error , but in the webserver console log I see this error repeated today a few times, so this shall be it: START 2019-12-12 08:44:08.412316 POST /profile/update END 2019-12-12 08:44:08.413486 POST /profile/update 0.0010035037994384766 Traceback (most recent call last): File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 997, in _handle out = route.call(**args) File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 2007, i n wrapper rv = callback(*a, **ka) File "emission/net/api/cfc_webapp.py", line 290, in updateUserProfile user_uuid = getUUID(request) File "emission/net/api/cfc_webapp.py", line 464, in getUUID retUUID = enaa.getUUID(request, auth_method, inHeader) File "/var/emission/e-mission-server/emission/net/auth/auth.py", line 62, in g etUUID userToken = __getToken__(request, inHeader) File "/var/emission/e-mission-server/emission/net/auth/auth.py", line 56, in _ _getToken__ userToken = request.json['user'] KeyError: 'user'

PatGendre commented 4 years ago

There is another error here seeming to point to a token expiry date but may be it's unrelated (usercache/get not profile/update): START 2019-12-12 11:49:45.159726 POST /usercache/get END 2019-12-12 11:49:45.364805 POST /usercache/get 0.2049095630645752 Traceback (most recent call last): File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 997, in _handle out = route.call(**args) File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 2007, i n wrapper rv = callback(*a, **ka) File "emission/net/api/cfc_webapp.py", line 236, in getFromCache user_uuid=getUUID(request) File "emission/net/api/cfc_webapp.py", line 464, in getUUID retUUID = enaa.getUUID(request, auth_method, inHeader) File "/var/emission/e-mission-server/emission/net/auth/auth.py", line 63, in g etUUID retUUID = getUUIDFromToken(authMethod, userToken) File "/var/emission/e-mission-server/emission/net/auth/auth.py", line 37, in g etUUIDFromToken userEmail = AuthMethodFactory.getAuthMethod(authMethod).verifyUserToken(toke n) File "/var/emission/e-mission-server/emission/net/auth/openid_auth.py", line 5 5, in verifyUserToken decoded_token = jwt.JWT(key=self.jwk_keyset, jwt=token) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 204, in __init__ self.deserialize(jwt, key) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 483, in deserialize self._check_provided_claims() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 360, in _check_provided_claims self._check_default_claims(claims) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 341, in _check_default_claims self._check_exp(claims['exp'], time.time(), self._leeway) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 323, in _check_exp claim, limit, leeway)) jwcrypto.jwt.JWTExpired: Expired at 1576136156, time: 1576147785(leeway: 60) START 2019-12-12 11:49:50.656765 POST /usercache/put END 2019-12-12 11:49:56.460207 POST /usercache/put 5.79919171333313 Traceback (most recent call last): File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 997, in _handle out = route.call(**args) File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 2007, i n wrapper rv = callback(*a, **ka) File "emission/net/api/cfc_webapp.py", line 244, in putIntoCache user_uuid=getUUID(request) File "emission/net/api/cfc_webapp.py", line 464, in getUUID retUUID = enaa.getUUID(request, auth_method, inHeader) File "/var/emission/e-mission-server/emission/net/auth/auth.py", line 63, in g etUUID retUUID = getUUIDFromToken(authMethod, userToken) File "/var/emission/e-mission-server/emission/net/auth/auth.py", line 37, in g etUUIDFromToken userEmail = AuthMethodFactory.getAuthMethod(authMethod).verifyUserToken(toke n) File "/var/emission/e-mission-server/emission/net/auth/openid_auth.py", line 5 5, in verifyUserToken decoded_token = jwt.JWT(key=self.jwk_keyset, jwt=token) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 204, in __init__ self.deserialize(jwt, key) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 483, in deserialize self._check_provided_claims() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 360, in _check_provided_claims self._check_default_claims(claims) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 341, in _check_default_claims self._check_exp(claims['exp'], time.time(), self._leeway) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/jwcrypto/jwt.p y", line 323, in _check_exp claim, limit, leeway)) jwcrypto.jwt.JWTExpired: Expired at 1576136156, time: 1576147796(leeway: 60)

shankari commented 4 years ago

I am not sure this is exactly the same 500 error , but in the webserver console log I see this error repeated today a few times, so this shall be it:

This is almost certainly it. The error is

File "/var/emission/e-mission-server/emission/net/auth/auth.py", line 56, in _ _getToken__ userToken = request.json['user'] KeyError: 'user'

For some reason, the user token is not filled out. This could either be because the token is not generated properly OR because the message is not formatted properly.

We saw the message formatting error in the UTF-8 issue and added some additional logging to debug it (https://github.com/e-mission/e-mission-docs/issues/333#issuecomment-476826259). Can you re-enable those logs and see the raw message so we can rule out formatting issues?

PatGendre commented 4 years ago

Hi @shankari Good guess, this seems to be the problem! I've found this error in the augmented logs: 2019-12-13 09:53:25,397:DEBUG:140196952897280:This should go to the log file 2019-12-13 09:53:25,397:DEBUG:140196952897280:trying to log unicode character é 2019-12-13 09:53:44,259:DEBUG:140205144966912:This should go to the log file 2019-12-13 09:53:44,260:DEBUG:140205144966912:trying to log unicode character é 2019-12-13 09:55:22,990:DEBUG:140204485850880:START POST /usercache/get 2019-12-13 09:55:22,991:DEBUG:140204485850880:Called userCache.get 2019-12-13 09:55:22,991:DEBUG:140204485850880:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ JSON 2019-12-13 09:55:22,991:DEBUG:140204485850880:ctype = application/json 2019-12-13 09:55:22,992:DEBUG:140204485850880:body = b'{}'

PatGendre commented 4 years ago

I suppose the "é" found in the logs comes from "décembre" (december), not from the user name or login because they use a login without any accent... But why only for these 2 users, they have the wrong client version I guess?

shankari commented 4 years ago

But why only for these 2 users, they have the wrong client version I guess?

I am not convinced that the problem is in fact with UTF-8. It is just that we added similar logging in the UTF-8 case.

shankari commented 4 years ago

In the UTF-8 case, the error was while parsing the body into JSON, which is why we were able to see 5 f\xe9vr. 2019 on the server and diagnose the error.

In this case, the body is blank body = b'{}' . We don't see any strange characters in the body - it is just blank.

This is consistent with an error in composing a message on the phone. Do you see a similar error for /profile/create or does it happen only for /profile/update?

If it happens for both, the error may be in retrieving the user token (so actually with openid). If it happens only for update, the error is likely to be in the push token retrieval since that is what we update the profile with.

PatGendre commented 4 years ago

Hi @shankari thanks for your help; taking more time to look in the 2/3 more recent log files, I have only this blank body once, with a usercache request, and it does not correlate with the so I am not sure this is the problem.

I have this error for a few /profile/update (and nothing with profile/create, which is much rarer), where there is no UUID (but not "é" error or blank body) : 2019-12-13 10:36:49,481:DEBUG:140204477458176:START POST /profile/update 2019-12-13 10:36:49,481:DEBUG:140204477458176:Called updateUserProfile 2019-12-13 10:36:49,482:DEBUG:140204477458176:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ JSON 2019-12-13 10:36:49,482:DEBUG:140204477458176:ctype = application/json 2019-12-13 10:36:49,482:DEBUG:140204477458176:body = b'{"update_doc":{"device_token":"dNhO8oqHRPU:APA91bHfrzNMCCYR_-VcwY56YfdqMONhKjss5M_YEItz89jhoZ1nbEYJF_GAfrPd3pUUAl1hPrVLB2aqTXfW-vpEXCXaHW0RpDDSIWqyM-ozSC9L_KGwEQBjqq3mySKv2Kz3MG_9740K","curr_platform":"android","curr_sync_interval":3600}}' 2019-12-13 10:36:49,482:DEBUG:140204477458176:END POST /profile/update 0.0010352134704589844

The 'trying to log unicode character é' log is not an error actually, it happens only when starting the server (it is located in the main part of cfc_webapp.py file and I don't see any reason why we added this logging.debud instruction!)

Here are a few more examples of the 500 errors I have with the user are for profile/updates : 2019-12-12 18:40:38,570:DEBUG:140646279014144:START POST /profile/update 2019-12-12 18:40:38,570:DEBUG:140646279014144:Called updateUserProfile 2019-12-12 18:40:38,571:DEBUG:140646279014144:END POST /profile/update 0.0010116100311279297 correlates with the apache2 500 error: grfmapfabmob:443 IPADRESS - - [12/Dec/2019:18:40:38 +0100] "POST /profile/upd ate HTTP/1.1" 500 5071 "-" "Traceur de mobilit\xe9 FabMob" And today with augmented logs I have this grfmapfabmob:443 31.39.141.6 - - [13/Dec/2019:10:36:49 +0100] "POST /profile/update HTTP/1.1" 500 5071 "-" "Traceur de mobilit\xe9 FabMob" and at that same time: 2019-12-13 10:36:49,481:DEBUG:140204477458176:START POST /profile/update 2019-12-13 10:36:49,481:DEBUG:140204477458176:Called updateUserProfile 2019-12-13 10:36:49,482:DEBUG:140204477458176:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ JSON 2019-12-13 10:36:49,482:DEBUG:140204477458176:ctype = application/json 2019-12-13 10:36:49,482:DEBUG:140204477458176:body = b'{"update_doc":{"device_token":"dNhO8oqHRPU:APA91bHfrzNMCCYR_-VcwY56YfdqMONhKjss5M_YEItz89jhoZ1nbEYJF_GAfrPd3pUUAl1hPrVLB2aqTXfW-vpEXCXaHW0RpDDSIWqyM-ozSC9L_KGwEQBjqq3mySKv2Kz3MG_9740K","curr_platform":"android","curr_sync_interval":3600}}' 2019-12-13 10:36:49,482:DEBUG:140204477458176:END POST /profile/update 0.0010352134704589844

Note that I suggested the user to install the embase app with from your damajash site, but the server seems to be down, so I suggested her to test the e-mission app from the Play Store. I will tell you if it works!

shankari commented 4 years ago

@PatGendre The update body seems to be valid here - as expected, the body contains the push notification token, the current platform and the sync interval and I don't see any unusual characters in it. What it doesn't have is the user field. This does appear to be an issue with open id, but I can't debug further unless I get the client logs.

I understand that they can't get the client logs if the onboarding does not complete. Since they are on android, can you give them a custom apk that has the update call commented out? You won't be able to demo the push notifications, but they can move beyond the onboarding and email me the logs for debugging.

Note that I suggested the user to install the embase app with from your damajash site, but the server seems to be down, so I suggested her to test the e-mission app from the Play Store. I will tell you if it works!

The damajash site was for a live demo, so only the polar bear version was updated. You have to refresh the screen until you get to the polar bear to get a working version. If you use the other screens, they will try to connect to a non-existent server.

PatGendre commented 4 years ago

@shankari thanks for the diagnostic, yes we really need a proper access to the keycloak logs to diagnose more easily what happened... I will see with Yann next week but he's not available on Monday, and I am not on Tuesday...

You have to refresh the screen until you get to the polar bear to get a working version.

Ok thanks

shankari commented 4 years ago

thanks for the diagnostic, yes we really need a proper access to the keycloak logs to diagnose more easily what happened...

Actually, you don't even need access to the keycloak logs - if you generate an apk without the update step as I suggested above, they will be able to get past the onboarding and I can look at the error from the app side.

PatGendre commented 4 years ago

Hi @shankari , we could generate an apk which bypasses the login so our user could try use the application. I am not sure what we can find, there are some 401 and 500 errors but no auth error because the user does not log in any moren with this apk.
For example:
CommunicationHelper : Got response org.apache.http.message.BasicHttpResponse@ff523f1 with status HTTP/1.1 401 Unauthorized
js : while reading data from server for 1577660400000 error = "While pushing/getting from server HTTP/1.1 500 Internal Server Error"
CommunicationHelper : Got response org.apache.http.message.BasicHttpResponse@58d9e51 with status HTTP/1.1 500 Internal Server Error

May I e-mail you the usercache logs sqlite DB so that you can have a look?

shankari commented 4 years ago

@PatGendre yes please. Let me see what is going on.

shankari commented 4 years ago

@PatGendre does the apk bypass the login or only the /profile/update call? If it bypasses the login entirely, then data cannot be sent/retrieved from the server, which might explain why you are getting the 500 errors.

You want to keep the login in place (since /profile/create did not appear to have any errors) and only remove the /profile/update call here https://github.com/e-mission/e-mission-phone/blob/11734545c7b9283beeb72c63899763091363ae98/www/js/splash/pushnotify.js#L71

PatGendre commented 4 years ago

@shankari yes I think the patch bypasses the login altogether and I will ask to modify the patch.

PatGendre commented 4 years ago

@shankari Hi, Tristan updated the patch (commenting out line 71); I reinstalled the patched app and it seemed to work for me but this was because my credentials were already stored in the cache, as Noemie installed the app she found that the "connect or create account" button was ineffective; and I had the same behaviour on my app when I cleared the app data. Sorry, we don't know out to solve this.

In case it might be useful I e-mailed to myself logs and usercache of the patched app, which I've installed for the week-end. I see 3 kinds of error: CommunicationHelper : Got response org.apache.http.message.BasicHttpResponse@82494a7 with status HTTP/1.1 500 Internal Server Error
ServerSyncAdapter : IO Error java.io.IOException while posting converted trips to JSON
BuiltinUserCache : Added value for key stats/client_error at time 1.578800730845E9

PatGendre commented 4 years ago

Hi @shankari we have tested again several times with our 2 users who cannot connect to the app, with the corrected patch (bypassing the profile/update call) and it still doesn't work, we really don't understand what happens, it may be related with openid (keycloak server) but it is not clear to us.

1) for one user (Florian), when he connects from the app via the openid login prompt, he gets an error (see screenshot), although he can connect from a browser to the openid keycloak server.

I created a test user, and when I connect to the app with the test user on my phone, it works fine. When Florian connects with the test user credentials, he gets the same error.

keycloakopenid-login-error

2) for the user (Noemie), a similar behavior : the connection to the openid keycloak server works but when she connects from the app, it doesn't. She comes back to the login prompt and cannot enter the app (but she doesn't get the error like Florian).

Any idea for this behaviour?

shankari commented 4 years ago

@PatGendre can we focus on the most obvious error:

I created a test user, and when I connect to the app with the test user on my phone, it works fine. When Florian connects with the test user credentials, he gets the same error.

Let me see if I understand this correctly.

Can you generate me e-mission phone logs from both you and Florian, for this scenario, starting from a freshly installed app? We can then look at the logs side by side and see the differences.

Because this is super weird. If the login works from your phone and not from Florian's phone for the same user then there must be something wrong/challenging about the configuration on Florian's phone, similar to the original CEREMA testing.

If you send me the test user credentials and the link to the FabMob app, I can also try to install and login from my phone and see if I can reproduce.

PatGendre commented 4 years ago

Thanks, i sent you the credentials. Florian cannot send logs because he can't connect. I can send you logs , I will (but it may take time, i am travelling)

shankari commented 4 years ago

Searching for this error, I found a similar issues reported by an automated testing framework.

https://github.com/cypress-io/cypress/issues/3119 https://stackoverflow.com/questions/41579517/keycloak-is-it-ready-for-production https://issues.jboss.org/browse/KEYCLOAK-4016

Consensus is that:

Is there a difference in the browser that you and Florian are using to finish the authentication (e.g. chrome v/s firefox)? I will try this out, but we may have to look at the keycloak logs at the end.

shankari commented 4 years ago

Looking at the keycloak code (yay open source!), it looks like this error is related to an invalid code https://github.com/keycloak/keycloak/blob/master/themes/src/main/resources/theme/base/login/messages/messages_en.properties#L257

invalidCodeMessage=An error occurred, please login again through your application.

And poking around for other instances of INVALID_CODE, it looks like it is generated if there is an invalid client session - e.g. there was already an authenticated session stored in the cookies but it doesn't match the new one.

In this location, in addition to generating the error, there is a challenge

https://github.com/keycloak/keycloak/blob/b478472b3578b8980d7b5f1642e91e75d1e78d16/services/src/main/java/org/keycloak/authentication/requiredactions/ConsoleVerifyEmail.java#L90

which I think means that the user will be asked to verify the email.

But in this case, initialVerifyAuthSession just generates the error.

https://github.com/keycloak/keycloak/blob/b478472b3578b8980d7b5f1642e91e75d1e78d16/services/src/main/java/org/keycloak/services/resources/SessionCodeChecks.java#L163

And it is because there is a mismatched cookie.

event.detail(Details.REASON, "cookie does not match auth_session query parameter");

There are a couple of other error locations, but they focus on handling an error generated from somewhere else, and I don't want to go down the rabbit hole yet.

Given this initial search, I wonder if there are some issues with cookies on Florian's phone. Does he have cookies turned off in the browser? Is he using a different browser?

PatGendre commented 4 years ago

Thanks for your inquiry ! I will forward your questions to Florian and keep you informed.

PatGendre commented 4 years ago

@shankari for Noémie at least, it appears that chrome was deactivated, and that this caused the problem. With the chrome app installed and activated on her phone, she could connect with her credentials :-)) thanks a lot for having identified the browser as the root cause !

I'll tell you when I know for Florian....

overflorian commented 4 years ago

Hi Shankari, thank you very much for your support. You are right: the webpage is opened within Chrome (version 79.0.3945.116, with no add-on and no ad-blocker) and that's the reason why it blocks me. Indeed, while trying in Firefox, I don't face such an error message.

Knowing that doesn't allow me to be able to use the app as the embedded webpage is only opened with Chrome in the app. Moreover, I must not be the only one to face this issue. Would you have any idea to face this issue? Thanks

shankari commented 4 years ago

@overflorian

shankari commented 4 years ago

From the AppAuth docs, it is possible to select the browser used for authentication, including the version. If we can figure out a reliable browser configuration, we can specify it as part of the auth config. https://github.com/openid/AppAuth-Android#controlling-which-browser-is-used-for-authorization

nlehuby commented 4 years ago

Hi, I'm Noémie :)

I've tested on two devices, one with Firefox (68.4.1) as default browser, and the second one with Firefox Klar as default browser. But I cannot reproduce Florian's bug, in both cases and with Chrome, I can login in the browser.

shankari commented 4 years ago

@overflorian so it looks like there is something specific to your Chrome since the same browser seems to work for both @PatGendre and @nlehuby.

shankari commented 4 years ago

if you plan to build your own app, and are worried about this affecting your end users, you could use one of the other authentication mechanisms, or maybe implement and contribute an integration with Auth0 which claims to have "solved auth" :)

nlehuby commented 4 years ago

Chrome 57.0.2987.132 on one device, and Chrome 79.0.3945.116 on the other.

shankari commented 4 years ago

@overflorian, I also wonder if the error this time is due to cached cookies from your previous login. Can you try clearing your cookies and see what the behavior is?

Also, if you do want to continue using OpenID Connect, you could add a message to the login screen about trying different browsers until one works. Not sure if that is optimal from a UX design perspective, but this also seems to be a corner case wrt auth since chrome works for everybody other than @overflorian

PatGendre commented 4 years ago

@overflorian so the problem is almost solved now, does it work you now, or not yet? maybe after clearing the cooking?

@shankari thanks again, shall I update the doc for this issues somewhere e.g. in "troubleshooting the app"? It is not clear to me if this likely to occur again as it concerns only the openid auth config.

shankari commented 4 years ago

@PatGendre I think it would still be useful to figure out why it is not working for @overflorian in his version of chrome. Once we know that, I think it would be useful to have some details in the openid auth section of the documentation. As I said earlier, it would probably be useful for deployers who use openid connect to put in some basic troubleshooting information into the sign in screen.