Please support mixing with non-LDAP groups and users #1

Open Natureshadow opened 5 years ago

On EduGit.org, users and groups can come from different sources. One is our LDAP directory, others are omniauth identities and manually created namespaces (groups). It seems that this tool does not expect any users or groups to be there that do not come from the LDAP directory. I do not know whether it just ocmplains loudly, but would work otherwise and just ignore the other users and groups, or if it would cause other errors if I ran it without the dry run option.

It would be good if this mode of operation could be supported correctly. Itworks out in general, as GitLab handles duplicate usernames from different sources quite nicely (it adds a number to the namespace if a newly authenticating user would cause the creation of an already existing namespace). It would maybe help if this tool had an option to keep all the synced LDAP groups as subgroups of a defined parent group, to start with.

Hi @Natureshadow,

Thank you for your interest in this tool.

As you've noticed already this tool is largely designed around LDAP being the primary source of users and groups. I built it this way as it's a fair expectation that if Gitlab is authenticating against LDAP at all it's going to be for reasons along these lines. (It would seem pretty rare to have LDAP authentication used at all in an application broader than for internal use.)

Some fairly large modifications would need to be made to have this tool work in a secondary manner as per your requirements. If you tried to use this tool in the way you like you'd encounter at least the following unexpected destructive behaviour:

Users not in LDAP would become blocked (but not deleted, this tool never deletes users)
Groups containing no users, subgroups, or projects would get deleted (unless you used the option to preserve empty groups)
Groups with members that are not LDAP users and a member of the matching LDAP group would all lose their membership for those groups

Therefore definitely do not use this tool on your Gitlab instances. You can see exactly what the tool would do if it were not in dry run mode -d by also using very verbose mode -vv.

To get this tool to work in a less destructive way you suggested this would pose some challenges.

The main one is that this tool matches users solely by the LDAP user object's "uid" attribute (customisable) against a Gitlab user's username, not the Gitlab user's namespace. This is because there is not enough information available for this tool to know that this would be a mistake. LDAP user objects don't typically have a namespace in the "slug" style Gitlab has, they typically work with the UID attribute, CN attribute, or full object DN. (In your case you could easily find that existing users get linked to different LDAP users unintentionally.)

This is the same reason that if an LDAP user's "uid" is changed a new user would be made on Gitlab rather than the intended existing user renamed accordingly. This couldn't be worked around unless there as an attribute added to the LDAP schema to keep track of the Gitlab user object ID so the tool would have something permanent to match with. -- Obviously when you rename the LDAP user their UID and DN both change, so the linked external ID on the Gitlab user becomes useless. Namespaces can also be changed via Gitlab admin making that also unsuitable to match with.

I appreciate your use case, but honestly it's quite beyond the scope scope I have enough free time for. However if someone else wanted to fork the project to implement it I'd happily accept a well built merge providing the original purpose and mode of operation was preserved in the same (or exceeding) quality it is now. (Secondary mode would have to be opt-in rather than the default behaviour.)

Hi,

I'm taking a stab at this for my company's GitLab server, although I can't promise it'll solve all of @Natureshadow's use case. So far I'm aiming at the following approaches:

For user accounts, filter out and ignore GitLab users that don't have an identity with a provider that matches the ldapServerName. This should correctly manage creation and blocking of GitLab users that are linked to the LDAP server, but safely ignore without changes any unrelated GitLab users.
Synchronize LDAP groups as subgroups of a root group, rather than as top-level groups. Then we can use group-to-group membership to delegate permissions to whatever manually managed GitLab group hierarchy our users deem convenient to organize their projects. Group synchronization is still strict, ie. non-LDAP users will be excluded from the LDAP group set.
Add some filtering and name mapping logic to work around some idiosyncrasies of my particular company's LDAP solution. (In particular, we have a lot of LDAP "users" that aren't people, so we'd like to synchronize only users that are members of a particular group -- however our LDAP server doesn't support the memberOf overlay, so I'm stuck filtering the user list after retrieval.)

WIP is here: https://github.com/willmmiles/gitlab-ce-ldap-sync/tree/partial-sync

Any suggestions or feedback would be greatly appreciated - if you think any of this would be of general utility, I'm happy to submit a pull request.

Suggestion: Expanding the tool to make userNamesToIgnore and groupNamesToIgnore a regular expression could probably done fairly easy. If your Non-LDAP and your LDAP users and groups can each be uniquely separated using a regex, the tool might probably work in such side-by-side-configurations... @Adambean, what would you think?

Regex shouldn't be hard to implement. I would have suggested that any strings in the array wrapped with / characters could be checked with preg_match() instead of a simple comparison, but / is a valid character for DNs and UIDs. (/ specifically doesn't even require escaping by \.)

I'd therefore suggest that if we're to implement Regex ignores they should be spun off to userNamesToIgnoreMatching and groupNamesToIgnoreMatching to contain an array of Regex pattern strings. -- These strings must be 100% compatible with PHP's preg_match() string $pattern parameter, so I'd expect the tool user to include / either side with the optional case-insensitive indicator "i" at the end.

On EduGit.org, users and groups can come from different sources. One is our LDAP directory, others are omniauth identities and manually created namespaces (groups). It seems that this tool does not expect any users or groups to be there that do not come from the LDAP directory. I do not know whether it just ocmplains loudly, but would work otherwise and just ignore the other users and groups, or if it would cause other errors if I ran it without the dry run option.

It would be good if this mode of operation could be supported correctly. Itworks out in general, as GitLab handles duplicate usernames from different sources quite nicely (it adds a number to the namespace if a newly authenticating user would cause the creation of an already existing namespace). It would maybe help if this tool had an option to keep all the synced LDAP groups as subgroups of a defined parent group, to start with.

I also would like that feature to be implemented. Just like the parameter "groupNamesToIgnore", it should exist an option like "syncOnlyTheseGroupNames". But in this option it would take into account the group inside GITLAB first and then would only lookup for members in LDAP Server. No other group in GITLAB would be checked.

Adambean / gitlab-ce-ldap-sync

Please support mixing with non-LDAP groups and users #1