data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
236 stars 82 forks source link

Migrating data.all assets between Teams - is there a documented process / guide ? #1496

Open gavclark opened 3 months ago

gavclark commented 3 months ago

I'd like to create a new team in data.all and move a selection of assets from an existing team to the new team.

For example, we have a team called "Sales" and I want to create a new team called "Sales Retail" as it is more reflective of the business organisation . I want to then move some assets from the "Sales" team to the "Sales Retail" team via a simple process / method.

I am not sure if there is a documented process to do this or a tool we could use , My thoughts are that we could update the metadata database, but there'd still be a lot of remediation needed for other AWS services .

For example what do we need to do with: glue jobs / existing CKD pipelines and the IAM role assigned to them lakeformation admins / access athena workgroups s3 buckets / datasets glue databases / tables

I realise there may be more to consider, but I'd like is to have a robust process for this as we'd like to reorganise our existing teams.

dlpzx commented 2 months ago

Hi @gavclark, thanks for opening an issue! If I understood it correctly this is an issue of usability, you started using data.all with a team with a generic name and as your usage grows you want to use more representative team names. As you forecasted, it has a couple of complications as the team name defines the team IAM role and athena workgroup. As soon as the environment CloudFormation stack is updated (which happens daily), the previous IAM role and workgroup will be deleted and a new ones will be created.

Initial complexity assessment

For the team IAM role:

As for the Athena workgroup:

Conclusion: it is a delicate operation with possible unseen issues. We strongly recommend some POC testing in a dev/test environment before carrying this out in prod.

Other options

Automated script using SDK/CLI

If I recognize your user correctly I believe your company is using an older version of data.all; which means this solution might not be available for you. Nevertheless, it is worth mentioning it.

We have implemented an SDK for data.all that allows users to call data.all APIs programmatically using Python. You could use the SDK to automate some of the operations needed: for all shares:revokeShare, deleteShare, thenupdateEnvironment, updateDataset, for all old shares createShareRequest, addItems, submitShare...

Enhancement to data.all code

Instead of replacing the IAM role and the workgroup, we could use a team label to add the info for the sales retail group. The IAM role will still be 'arn:...."/role:sales' but in data.all it will appear as owned by the Sales Retail group. It might be easier than the actual migration of AWS resources.

gavclark commented 2 months ago

Hi @dlpzx , thanks for the reply !

I maybe forgot to mention, we've just recently upgraded to 1.6.2 so we're very far behind current releases :(

To clarify, we'd keep the team for "Sales" , we wouldn't be deleting it or the IAM role associated with it. The idea would be that we'd create a new team "Sales Retail" and migrate some of the assets from "Sales" to "Sales Retail".

I think if we're not deleting the original team, that may simplify things ?. Thanks a lot for your thoughts on the possible complexities, it will help a lot.

Agreed, we definitely need to do a lot of testing in a non-production environment first - I just wanted to check I hadn't missed anything obvious ! .