fabric8-services / fabric8-wit

wit stands for Work Item Tracker
http://devdoc.almighty.io/
Apache License 2.0
45 stars 86 forks source link

login sometimes fails when running on minishift/minikube #1526

Open jstrachan opened 7 years ago

jstrachan commented 7 years ago

if KeyCloak restarts (though reusing the same PV) and the user logs in again we sometimes get this error in WIT:

{
errors: [
{
code: "internal",
detail: "failed to create user/identity pq: duplicate key value violates unique constraint "uix_users_email"",
id: "x5m2FuRx",
status: "500",
title: "Internal Server Error"
}
]
}

So far the only workaround seems to be scrapping the WIT PVC and re-installing WIT again. I wonder if we can figure out a workaround?

jstrachan commented 7 years ago

BTW to reproduce I've been the only user of WIT on minishift/minikube and using the same github user each time

hectorj2f commented 7 years ago

@jstrachan Did you Postgresql KC database restart it too ? If so, can it be possible the database was recreated from scratch for KC ? In that case, if you try to register the user again, it'll exist in the wit db but not in the kc db. That is why you get this problem. However if the KC database remains the same, it can be a particular issue that only happens in this setup (f8-platform).

jstrachan commented 7 years ago

BTW have just tried a reinstall on minishift after deleting all the PVCs and apps and I get this:

{
errors: [
{
code: "internal",
detail: "Unable to create oauth state reference pq: relation "oauth_state_references" does not exist",
id: "hOxQ7u38",
status: "500",
title: "Internal Server Error"
}
]
}

have tried trashing the PVCs for keycloak / wit / init tenant and restarting and no luck figuring out how to login again other than trashing the whole VM

jstrachan commented 7 years ago

@hectorj2f yeah am wondering if the KC restart causes the DB to be zapped - despite the PVC - not 100% sure TBH

jstrachan commented 7 years ago

aha - to remove the Unable to create oauth state reference pq: relation "oauth_state_references" does not exist error I had to zap all the PVCs and the PVs on minishift before re-installing

alexeykazakov commented 7 years ago

@jstrachan, yes, @hectorj2f is correct. Such errors occur when Keycloak DB and WIT DB are not in sync. When you login in KC the very first time we create an identity in the WIT DB with the same ID as in Keycloak. If you then nuke the Keycloak DB and start a new one from scratch then when you login again using the same github user Keycloak will create a new user in its DB with a new ID! So the existing user/identity in the WIT DB has now ID which doesn't match the ID of the Keycloak user. So WIT can't find the existing user and tries to create a new user in the WIT DB with the same email. Emails are supposed to be unique and it fails.

We relay on user/identity ID because this is what we have in the access tokens to identify the user.

jstrachan commented 7 years ago

I think our issue with upstream fabric8 is that the KC pod can restart; which causes the DB to get recreated, causing the 2 DBs to get out of sync which then means you can not login. You then have to delete both DBs and start again ;) https://github.com/fabric8io/fabric8-platform/tree/master/apps/keycloak/src/main/fabric8

I guess our KC configuration to import the configuration on startup https://github.com/fabric8io/fabric8-platform/blob/master/apps/keycloak/src/main/fabric8/deployment.yml#L172 from the ConfigMap causes it to trash the logged in users too unfortunately?

I think this is the configuration that does an import & override: https://github.com/fabric8io/fabric8-platform/blob/master/apps/keycloak/src/main/fabric8/deployment.yml#L114-L117

alexeykazakov commented 7 years ago

Yeah.. Use IGNORE_EXISTING instead of OVERWRITE_EXISTING in https://github.com/fabric8io/fabric8-platform/blob/master/apps/keycloak/src/main/fabric8/deployment.yml#L117 so a new pod won't kill the existing configuration & users.

hectorj2f commented 7 years ago

@alexeykazakov @jstrachan yes and no, it is a bit tricky. You need OVERWRITE_EXISTING for the initial installation, afterwards you could switch to IGNORE_EXISTING.

rawlingsj commented 7 years ago

IGNORE_EXISTING would mean we'll need to provide upgrade scripts of some sort when we make a change to the default KC configmap and users upgrade to a newer version. It feels like there's a missing MERGE strategy or something.

jstrachan commented 7 years ago

its tricky - we have to use IGNORE_EXISTING right now as without it, whenever the KC pod stops the entire install is completely unusable.

The tricky bit is how we are meant to do upgrades to the fabric8 realm configuration. Also we need to be careful; when folks wanna re-install from scratch, you have to make sure you blow away all your PVs (which doesn't happen by default on a mac with minikube and VirtualBox for example - have found you have to minishift ssh and rm -rf /tmp/hostpath-*/* for that.

We're gonna have to figure out some way to upgrade the fabric8 realm without trashing all the KC + WIT DBs really for fabric8.

hectorj2f commented 7 years ago

Perhaps, we could add some additional logic to the keycloak image to check if the configuration file has changed since the last time. We need to store the configuration file in a volume to check it with every restart of kc pod. If it didn't change since the last time, then it should use IGNORE_EXISTING as strategy, otherwise OVERWRITE_EXISTING. Additionally to that, the user could add an ENV variable to enforce the strategy to use regardless of this logic.

jstrachan commented 7 years ago

@hectorj2f great idea! The init container we have could maybe try check if the config has really changed and if so swizzle to OVERWRITE_EXISTING on the first new release.

The only issue is gonna be that if we do need to switch from IGNORE_EXISTING -> OVERWRITE_EXISTING that seems to trash the KC database which then trashes all the users which then means we need to trash the WIT database (or at least all the users) so that folks can login.

So we may need a more sophisticated 'update realm' logic that just updates the tenant without replacing it - to preserve all the users inside?

hectorj2f commented 7 years ago

Good time to implement an operator or controller that checks if the kc db is wiped it out to trigger a new redeployment-vol cleanup of the wit db.

jstrachan commented 7 years ago

Agreed. I was hoping we could try avoid wiping out any databases; and just have a more clever way to upgrade the fabric8 realm, while leaving all users as they are really.

Its not gonna be a great UX if we keep having to delete all the spaces and issues in WIT just because we had to tinker slightly with some KC config for the fabric8 realm :)

jstrachan commented 7 years ago

I guess worst case just removing users from the WIT database; would that let us preserve spaces/issues? Things might go wacky though if we have to keep removing users periodically for each KC upgrade?

Maybe we just need a way to re-associate users with their new KC ID after the KC upgrade to avoid wiping the WIT DB?

alexeykazakov commented 7 years ago

KC identity/user IDs are used a lot in the WIT DB. Space owners, commit creators, etc. So, changing the ID of the existing users can be tricky.

jstrachan commented 7 years ago

thanks - I figured they may be - so we definitely want a clever migration that just modifies the fabric8 realm config without trashing users

alexeykazakov commented 7 years ago

And yes, using OVERWRITE_EXISTING will trash the entire DB with all users etc. I'm not aware of any way how we could update the existing realm via passing a JSON file without trashing the users. There is a CLI tool in Keycloak which can be used for updating the realm config though. Using some cli script which will adjust the realm seems to be the only way to update the existing realm and keep the users.

jstrachan commented 7 years ago

yeah - I think we're gonna need to add some kind of initialisation where if the JSON changes, we run some CLI tool to modify the realm - and never use OVERWRITE_EXISTING ever again :)

alexeykazakov commented 7 years ago

Here is the docs for the CLI tool I'm referring to - https://keycloak.gitbooks.io/documentation/server_admin/topics/admin-cli.html