Flagsmith / flagsmith-charts

Helm Charts for Flagsmith. Feature flagging and remote config service. Host yourself or use our hosted version.
https://www.flagsmith.com/
Other
36 stars 31 forks source link

Add periodic LDAP group / users sync #119

Open matthewelwell opened 1 year ago

matthewelwell commented 1 year ago

In the flagsmith application we have written a django management command that synchronises LDAP groups / users with Flagsmith groups / users. This can be run via python manage.py sync_ldap_users_and_groups. We'd like a process for enterprise users to run this on a specific schedule.

My immediate suggestion is to use k8s cron jobs, something like the following.

apiVersion: batch/v1
kind: CronJob
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: ldap-sync
            image: flagsmith:${{ api.version }}
            imagePullPolicy: IfNotPresent
            command:
            - python
            - manage.py
            - sync_ldap_users_and_groups
          restartPolicy: OnFailure

However, there are perhaps some additional complications here (I think we might need to add it to the run-docker.sh bash script for example? There might also be a simpler / better way to achieve this.

gz#262

matthewelwell commented 1 year ago

Relevant PRs:

https://github.com/Flagsmith/flagsmith-docs/pull/276 https://github.com/Flagsmith/flagsmith-ee/pull/81

plumdog commented 1 year ago

@matthewelwell Yes, I think k8s cronjobs are probably the right way. But some thoughts on some of the tradeoffs and general thoughts:

matthewelwell commented 1 year ago

once the schedule is set in the cronjob (by passing something to the chart values) a user within the web application can't then adjust it. A way around this would be the database to store the schedule, that a user can adjust, and then have a background worker look at the schedule and run the import when needed. But this means writing cron yourself which sucks and you should probably avoid if you can, which I guess you can.

I don't think we'd need to adjust it really and I guess, if needed, it can be adjusted by updating and redeploying the chart?

how bad would it be if two instances of this command were running at the same time (eg, if I set the schedule as * and each run takes more than 1 minute, then the 12:01 command will start before the 12:00 command has finished)? Cronjobs can accommodate this (has a concurrencyPolicy that can be set to Forbid)

This shouldn't be an issue.

Yes, that's a good point about the error handling / success reporting, etc. at the moment, any failures (network, configuration, etc.) would result in a non-zero exit code and no status would be reported. @gagantrivedi I wonder if therefore, we should add a field to the organisation (or a separate OrganisationStatisticsReporting model?) that reports the datetime of the last successful run of this job (perhaps we could include a previous_error field too?)?

I don't know that we'd necessarily be able to differentiate between network blips or credential failures, however. The code looks something like the following, so it looks like we just get None if the connection fails - no information about the reason.

auth_kwargs = {
    "username": settings.LDAP_SYNC_USER_USERNAME,
    "password": settings.LDAP_SYNC_USER_PASSWORD,
}
with ldap.connection(**auth_kwargs) as connection:
    if connection is None:
        raise CommandError("Could not connect to LDAP server")