data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
236 stars 82 forks source link

Failed to import a bucket with a long name #1710

Open zsaltys opened 1 week ago

zsaltys commented 1 week ago

Describe the bug

My bucket name is 50 characters long. When this bucket is imported into data.all it tries to create a dataset admin role with format: dataall-TRUNCATED_BUCKET_NAME-30ydkjf9

The problem is that this truncation is not long enough. Data.all truncated my bucket name to 49 characters. Let's add them all together

dataall-(8)TRUNCATED_BUCKET_NAME(49)-30ydkjf9(9) = 8 + 49 + 9 = 66

However AWS IAM allows only 64 characters maximum. Therefore I received an error:

Screenshot 2024-11-21 at 11 56 18

How to Reproduce

Import a dataset with a bucket name longer than 50 characters.

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

N/A

Python version

N/A

AWS data.all version

2.6

Additional context

No response

TejasRGitHub commented 6 days ago

Able to reproduce the same issue and found the bug.

While calculating the name following is used f"{slugify(self.resource_prefix + '-' + self.target_label[:**(max_length - len(self.resource_prefix + self.target_uri)**)] + suffix, regex_pattern=fr'{regex}', separator=separator, lowercase=True)}"

The highlighted part truncates the bucket name to how many characters so as to accomodate the suffix ( which is the targetURI = the datasetUri in this case ). The calculation does miss 2 characters - the '-' used in between the naming.

To correct , following should be used , f"{slugify(self.resource_prefix + '-' + self.target_label[:(max_length - len(self.resource_prefix + self.suffix) - 1)] + suffix, regex_pattern=fr'{regex}', separator=separator, lowercase=True)}"

One '-' is used in between self.resource_prefix + '-' + self.target_label... and another '-' is used in the suffix which is formed of -{targetUri}.

Things to consider before making this change

Changing this logic will likely affect all the dataset iam role names and not just those but whereever NamingConventionService is used and i.e. for Environments and other places where Stacks are used. Also this same is used for generating policies.