FlowFuse / flowfuse

Build bespoke, flexible, and resilient manufacturing low-code applications with FlowFuse and Node-RED
https://flowfuse.com
Other
269 stars 63 forks source link

FlowFuse: admin access broken after after enabling and disabling EntraID assertions. #4084

Closed SynoUser-NL closed 1 month ago

SynoUser-NL commented 3 months ago

Current Behavior

After configuring FlowFuse SAML access by Entra ID assertions (as per https://flowfuse.com/docs/admin/sso/saml/#microsoft-entra), creating and deleting a test team, and disabling the Group Assertions, my FlowFuse Admin-enabled account is not working as it should.

Observations:

We've had this happen on 2 occasions, with multiple FF admin-enabled accounts. Reverting to the former situation was only possible after deleting the user completely, and following the onboarding steps, assigning all teams to the new user.

An upgrade to FF 2.5 from 2.4 did not improve the situation! As a FF admin, I am now hindered by not having full functionality. Since we do not want to create an unstable situation we are hesitant to enable the Group Assertion option again, even after upgrading. We also didn't dare use the All Teams option in Group Assertions, as when this goes south I'll be spending the next 2 days correcting the issue.

I've created a local-only ff admin account to be able to do the stuff I can not do with my regular admin account, but so far I have not been able to restore full access to my regular admin account using the local account.

This issue may or may not be related to https://github.com/FlowFuse/flowfuse/issues/4026

PS: as a request, please also create the possibility to define FlowFuse Admins by AD group. https://github.com/FlowFuse/flowfuse/issues/4085 PPS: as a request, a FF admin should automatically have access to all teams, not only if that admin is a member of the team. Or at least be able to make that account a member of any team, no questions\acceptance asked. https://github.com/FlowFuse/flowfuse/issues/4086

Expected Behavior

Disabling group assertions after testing with groups should return access to the previous configuration Existing access for configured accounts should not be altered, especially for admin enabled accounts!

Steps To Reproduce

0) Create a local FF user with admin rights to be sure you can access FF after that. 1) Create multiple FF teams with multiple SSO-enabled admin users, and configured SSO connection, group assertions disabled. 2) Create a new Team with the necessary AD groups with correct team slug. 3) Enable Group Assertions for that team only! And test access by switching roles by AD group membership. In itself this works perfectly. 4) Delete the Team and log off and on again. With logon you may now get a 404, while the User dropdown works as it should. 5) The Team dropdown now displays only 1 Team and the Create Team option, even when clicking the Team dropdown. 6) Going to {User Dropdown} \ Admin Settings \ Teams will display all teams. Clicking on the team will send you to the Team page where you have full access. The Team dropdown will display the current team & create new team option only (even when you drop down). 7) Various admin options will give a 404 error, including Admin Settings \ Settings \ SSO \ {SAML Config}, leaving you unable to alter the SSO configuration.

The above done by heart to the best of my memory. Happy to send screenshots privately or have meeting to show.

Environment

Have you provided an initial effort estimate for this issue?

I am not a FlowFuse team member

knolleary commented 3 months ago

Thanks for reporting @SynoUser-NL

I'll investigate today and get back to you.

knolleary commented 2 months ago

Hi @SynoUser-NL

I've spent some time today trying to recreate the issues you have seen - without any luck. I've tried various combinations of adding/removing teams, modifying groups members, deleting team - but everything works as expected.

There may be some subtle aspect to this I'm overlooking, but having tried lots of combinations, I'm not sure what that could be.

Can I take you up on the offer of a call to step through it on your side? You're welcome to grab some time next week via https://calendly.com/knolleary/30min

SynoUser-NL commented 2 months ago

Hi @knolleary ,

Thanks for your time investigating this. Hmm.. my luck again.. Then it probably means we're doing something wrong. I will send you screenshots (by email) so you can see some of the things I'm running into. And I'll schedule a meeting. Should the screenshots be enough, feel free to cancel the meeting.

knolleary commented 2 months ago

Adding some additional information provided by the OP via email and my analysis of it.

Looking the /user/teams endpoint, it calls:

The only place hashid is referenced is in the latter call - where it accesses the Team.hashid property of the TeamMembership object. There for, Team is null at this point.

This means there is a TeamMember object in the database for this user where the Team property has been nulled.

The question is how has the TeamMember object got a null Team. The cascade delete on the relationship between Team/TeamMembership means the Membership will be deleted when the Team is deleted - rather than set null. And the TeamMembership.TeamId col in the database is explicitly set to not null - so it shouldn't be possible to get in this state.

Will look more at the SSO Group/Team management code - which creates TeamMember objects - to see if there's a scenario where a null or invalid TeamId can be inserted (although how that gets past the db constraints, I don't know)

knolleary commented 2 months ago

Further - OP reports that a non-SSO managed admin user has been similarly affected - getting the null/hashid error. This points the finger back at the team delete step.

SynoUser-NL commented 2 months ago

@knolleary Know you are very busy, so scan I politely inquire if there is any progress on discovering a way to resolve this issue?

Thank you!

knolleary commented 2 months ago

Hi @SynoUser-NL - sorry for the delay. By the end of Friday I'd exhausted my theories for the root cause. The next step is going to be to setup a call where were able to have a peek into your database to understand what state it is in. Can I ask you to setup a call using the same link as before at a time we'll be able to direct you how to login to the database and run some queries?

SynoUser-NL commented 2 months ago

No problem at all .. :) I'll check with my server admin for an appropriate time and make an appointment.

Call planned. Thank you!

knolleary commented 2 months ago

We have identified the root cause here being a missing database constraint on the TeamMembers table; meaning when the team was deleted, it did not delete any of the TeamMember entries.

I have not yet identified how the constraint has come to be missing. I know the instance in question was installed with quite an early version of FF and has been progressively upgraded.

To fix the issue, first we need to remove the 'bad' entries from the TeamMembers table.

# Check for any remaining bad entries:
select * from "TeamMembers" where not exists (select from "Teams" where "TeamMembers"."TeamId" = "Teams".id);

# If there are any, delete them:
delete from "TeamMembers" where not exists (select from "Teams" where "TeamMembers"."TeamId" = "Teams".id);

Then, we can apply the missing db constraints to the TeamMembers table:

alter table "TeamMembers" add constraint "TeamMembers_TeamId_fkey" FOREIGN KEY ("TeamId") REFERENCES "Teams"(id) ON UPDATE CASCADE ON DELETE CASCADE;
alter table "TeamMembers" add constraint "TeamMembers_UserId_fkey" FOREIGN KEY ("UserId") REFERENCES "Users"(id) ON UPDATE CASCADE ON DELETE CASCADE;

@hardillb can you provide the command for backing up the DB prior to running any of these?

hardillb commented 2 months ago

The command to backup the database is going to be

docker exec -it flowfuse-postgres-1 pg_dump flowforge > backup.sql

This assumes that the postgres container is called flowfuse-postgres-1, change if needed. It will write a backup.sql in the current working directory.

knolleary commented 2 months ago

@SynoUser-NL can I check you to had seen our comments above and been able to action them?

SynoUser-NL commented 2 months ago

Sorry, I've been away from work for a couple of days.. I'll get right on it and keep you informed.

SynoUser-NL commented 2 months ago

image

Result of the queries..

SynoUser-NL commented 1 month ago

@knolleary @hardillb : Please note the error in the screenshot above. Not sure how serious this is..

hardillb commented 1 month ago

@SynoUser-NL This implies that there are other constraints possibly missing from the database.

Could you gzip the backup.sql file created before you ran the changes and email it to me please. This file should have the current full DB schema at the top of the backup so we can check the other constraints and to check what other data may need cleaning up first.

Thanks.

SynoUser-NL commented 1 month ago

Sure! File sent.

Thanks!

hardillb commented 1 month ago

Thanks, that was very helpful. It looks like there was one extra bit of clean up to do before the last alter table command.

Could you please run the following:

# check for TeamMember entries for users that no longer exist
select * from "TeamMembers" where "UserId" not in (select "id" from "Users");

# remove these entries
delete from "TeamMembers" where "UserId" not in (select "id" from "Users");

When I run this on your imported DB I see (8) entries for UserId 3, 7 & 9 which are missing from the Users table (deleted users)

You should then be able to run the final alter table command cleanly

alter table "TeamMembers" add constraint "TeamMembers_UserId_fkey" FOREIGN KEY ("UserId") REFERENCES "Users"(id) ON UPDATE CASCADE ON DELETE CASCADE;

This command is adding in the cascade trigger that would have removed these old team members.

SynoUser-NL commented 1 month ago

@hardillb image That appears to have worked fine! 👍

SynoUser-NL commented 1 month ago

Hi @hardillb and @knolleary,

Was this the last step and can we now consider the issue fixed (at least on our end)? Or is there anything else that needs to be done? I've held off on upgrading to the newer Flowfuse version while this correction was in progress, but we would like to update to 2.7 asap and try enabling SAML Group Assertions again.

Thanks! Den

hardillb commented 1 month ago

@SynoUser-NL yes, we believe things should be all correct in the database now. Please give 2.7.0 a try.

SynoUser-NL commented 1 month ago

Excellent! Thank for all your help!

SynoUser-NL commented 1 month ago

I've enabled SSO Group Assertions in V2.7.1. Will need to test over weekend and on monday if all works well. Will report back here.