databrickslabs / databricks-sync

An experimental tool to synchronize source Databricks deployment with a target Databricks deployment.
Other
46 stars 13 forks source link

Sync Import failing on User-Scim #110

Open tomwoollacott-8451 opened 2 years ago

tomwoollacott-8451 commented 2 years ago

We are running databricks-sync import against workspace export files generated by dbx-sync. Regardless of whether IDENTITY has been succesfully imported or not (or if Identity is called in conjunction w/ pools/policies or seperately), when either INSTANCE_POOL or CLUSTER_POLICY are imported, we run into an error regarding a reference to undeclared ‘databricks_user.databricks_scim_users’ stemming from the permissions.tf.json files of each pool(line 6)/policy(line 7) imported. Message: “A managed resource “databricks_user” “databricks_scim_users” has not been declared in the root module.”

stikkireddy commented 2 years ago

Hey @twoollacott8451 can you provide a bit more details:

  1. Are there users in the workspace other than service principals?
  2. Are there groups in the workspace other than users and admins group.
  3. Can you share your yaml file?
tomwoollacott-8451 commented 2 years ago
  1. Yes.
  2. There were a few (existing) dummy test groups included in the workspace. 3:
    #Name the configuration set, this can be used to track multiple configurations runs and changes
    name: gbx_snapshots
    # Add this value if you want all the groups, users and service principals to be parameterized so you can map
    # them to another value using tf_vars
    parameterize_permissions: true
    objects:
    notebook:
    # Notebook path can be a string, a list or a YAML items collection (multiple subgroups starting with - )
    notebook_path: "/Users"
    # In workspacse you may have deleted users who leave behind a trail of created notebooks. Enabling this to true
    # prevents them from being exported. This is optional and will default to false. Please set to true if you want the
    # sync tool to skip them.
    exclude_deleted_users: true
    # Use Custom map var to setup a new location
    #    custom_map_vars:
    #      path: "/Users/%{DATA:variable}/%{GREEDYDATA}"
    # Certain patterns can be excluded from being exported via exclude_path field. Make sure to use
    # the glob syntax to specify all paths.
    #    exclude_path:
    #      - "/Users/**" # Ignore all paths within the users folder
    #      - "/tmp/**" # Ignore all files in the tmp directory
    global_init_script:
    # pattern will be implemented in the future - make sure you have "*" in here
    patterns:
      - "*"
    cluster_policy:
    # pattern will be implemented in the future - make sure you have "*" in here
    patterns:
      - "*"
    # dbfs_file:
    # DBFS path can be a string or a set of YAML items (multiple subgroups starting with - )
    #   dbfs_path:
    #     - "dbfs:/tests"
    #     - "dbfs:/databricks/init_scripts"
    # Certain patterns can be excluded from being exported via exclude_path field. Make sure to use
    # the glob syntax to specify all paths. Make sure all paths start with / and not dbfs:/.
    #    exclude_path:
    #      - "**.whl" # Ignore all wheel files
    #      - "**.jar" # Ignore all jar files
    #      - "/tmp/**" # Ignore all files in the tmp directory
    instance_pool:
    # pattern will be implemented in the future - make sure you have "*" in here
    patterns:
      - "*"
    secret:
    # pattern will be implemented in the future - make sure you have "*" in here
    patterns:
      - "*"
    cluster:
    # pattern will be implemented in the future - make sure you have "*" in here
    patterns:
      - "*"
    # Use this to choose to pin the first twenty clusters. (This is a limit set by the databricks platform.)
    # This can help prevent your clusters from disappearing after 30 days if they are in terminated state.
    #    pin_first_20: false
    #   Filter by certain fields using regular expressions on cluster_spec fields to select a set of clusters
    #    by:
    #      cluster_name:
    #        - ".*fun.*"
    job:
    # pattern will be implemented in the future - make sure you have "*" in here
    patterns:
      - "*"
    ## The following options will allow you to set static variables which need to be provided at runtime for
    ## clusters, instance pools and policies
    #convert_existing_cluster_to_var: true
    #convert_new_cluster_instance_pool_to_var: true
    #convert_new_cluster_cluster_policy_to_var: true
    #    Filter by certain fields using regular expressions on job settings to select a set of jobs
    #    by:
    #      settings.existing_cluster_id:
    #        - ".*fun.*"
    identity:
    # pattern will be implemented in the future - make sure you have "*" in here
    patterns:
       - "*"
    # Set this to true or false to set a default to users active field. Omitting this will just use their source value
    # set_all_users_active: false
stikkireddy commented 2 years ago

@twoollacott8451 i am struggling to recreate the issue on my side. I tried with various different workspaces. one with one user and 1 additional group i am not able to recreate this issue. Do you get an identity folder when you run an export?

exports/identity/databricks_scim_users.tf.json

Can you confirm if that file is there? Can you also run databricks-sync --version and provide your version number?