aws-samples / aws-data-mesh-utils

Apache License 2.0
85 stars 22 forks source link

Unable to create Data Product #2

Closed rijuseth1312 closed 2 years ago

rijuseth1312 commented 2 years ago

I have a database and table in producer account. I am trying to run the following script as mentioned in your blog import logging from data_mesh_util import DataMeshProducer as dmp

data_mesh_account = '~~~~' aws_region = 'us-east-1' producer_credentials = { "AccountId": "#############", "AccessKeyId": "###################", "SecretAccessKey": "#########################" } data_mesh_producer = dmp.DataMeshProducer( data_mesh_account_id=data_mesh_account, log_level=logging.DEBUG, region_name=aws_region, use_credentials=producer_credentials )

database_name = 'redshift' table_name ='cars' domain=None data_product_name=None cron_expr=None crawler_role =None create_public_metadata = True

data_mesh_producer.create_data_products( source_database_name=database_name, table_name_regex=table_name, domain=domain, data_product_name=data_product_name, create_public_metadata=True, sync_mesh_catalog_schedule=cron_expr, sync_mesh_crawler_role_arn=crawler_role, expose_data_mesh_db_name=None, expose_table_references_with_suffix=None )

Now I have tried to run it in my producer account and data mesh admin account as well after facing error but nothing works... It throws the following error :-

Loaded 3 tables matching None from Glue Verified Database redshift-175908995626 Validated Data Mesh Database redshift-175908995626 175908995626 Database redshift-175908995626 Permissions:['CREATE_TABLE', 'DESCRIBE'] Granted access on Database redshift-175908995626 to Producer Verified Database redshift-175908995626 Validated Producer Account Database redshift-175908995626 Existing Table Definition {'Name': 'cars', 'Owner': '175908995626', 'LastAccessTime': datetime.datetime(2022, 1, 7, 17, 43, 9, tzinfo=tzlocal()), 'Retention': 0, 'StorageDescriptor': {'Columns': [{'Name': 'id', 'Type': 'bigint'}, {'Name': 'car', 'Type': 'string'}], 'Location': 's3://aws-analytics-course/redshift/data/csv/cars/', 'InputFormat': 'org.apache.hadoop.mapred.TextInputFormat', 'OutputFormat': 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', 'Compressed': False, 'NumberOfBuckets': -1, 'SerdeInfo': {'SerializationLibrary': 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe', 'Parameters': {'field.delim': ','}}, 'BucketColumns': [], 'SortColumns': [], 'Parameters': {'CrawlerSchemaDeserializerVersion': '1.0', 'CrawlerSchemaSerializerVersion': '1.0', 'UPDATED_BY_CRAWLER': 'redshift', 'areColumnsQuoted': 'false', 'averageRecordSize': '7', 'classification': 'csv', 'columnsOrdered': 'true', 'compressionType': 'none', 'delimiter': ',', 'objectCount': '1', 'recordCount': '16', 'sizeKey': '112', 'skip.header.line.count': '1', 'typeOfData': 'file'}, 'StoredAsSubDirectories': False}, 'PartitionKeys': [], 'TableType': 'EXTERNAL_TABLE', 'Parameters': {'CrawlerSchemaDeserializerVersion': '1.0', 'CrawlerSchemaSerializerVersion': '1.0', 'UPDATED_BY_CRAWLER': 'redshift', 'areColumnsQuoted': 'false', 'averageRecordSize': '7', 'classification': 'csv', 'columnsOrdered': 'true', 'compressionType': 'none', 'delimiter': ',', 'objectCount': '1', 'recordCount': '16', 'sizeKey': '112', 'skip.header.line.count': '1', 'typeOfData': 'file'}} Created new Glue Table cars 175908995626 Table cars Column Permissions:['INSERT', 'SELECT', 'ALTER', 'DELETE', 'DESCRIBE'], ['INSERT', 'SELECT', 'ALTER', 'DELETE', 'DESCRIBE'] WITH GRANT OPTION 175908995626 Table cars Permissions:['ALTER', 'DESCRIBE', 'INSERT', 'DELETE'], ['ALTER', 'DESCRIBE', 'INSERT', 'DELETE'] WITH GRANT OPTION Traceback (most recent call last): File "create-data-product", line 35, in expose_table_references_with_suffix=None File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/DataMeshProducer.py", line 314, in create_data_products use_original_table_name=use_original_table_name File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/DataMeshProducer.py", line 145, in _create_mesh_table grantable_permissions=perms File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/lib/ApiAutomator.py", line 892, in lf_grant_permissions grantable_permissions=grantable_permissions File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/lib/ApiAutomator.py", line 872, in lf_batch_grant_permissions Entries=entries File "/home/cloudshell-user/.local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/cloudshell-user/.local/lib/python3.7/site-packages/botocore/client.py", line 678, in _make_api_call api_params, operation_model, context=request_context) File "/home/cloudshell-user/.local/lib/python3.7/site-packages/botocore/client.py", line 726, in _convert_to_request_dict api_params, operation_model) File "/home/cloudshell-user/.local/lib/python3.7/site-packages/botocore/validate.py", line 319, in serialize_to_request raise ParamValidationError(report=report.generate_report()) botocore.exceptions.ParamValidationError: Parameter validation failed: Missing required parameter in Entries[0]: "Id" Missing required parameter in Entries[1]: "Id"

I am unable to get what this error means , I tried everything to solve it but nothing works. I request you to please let me know what this error means or am I doing anything wrong.

My contact Details are 👍 Email : rijulseth1312@gmail.com M- +91-9650819894 Name : Rijul Seth

IanMeyers commented 2 years ago

Looking into this for you Rijul.

IanMeyers commented 2 years ago

Apologies - there was a bug in the path through the batch create permissions. If you can please update to version 1.0.1 from pypi, the issue should be resolved.

rijuseth1312 commented 2 years ago

Hi Ian

Thankyou so much for your help. I will update it from pypi and will solve it.

Thank you again Rijul Seth

On Sat, 15 Jan 2022, 23:41 Ian Meyers, @.***> wrote:

Apologies - there was a bug in the path through the batch create permissions. If you can please update to version 1.0.1 from pypi, the issue should be resolved.

— Reply to this email directly, view it on GitHub https://github.com/aws-samples/aws-data-mesh-utils/issues/2#issuecomment-1013728246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQIACPVC3V6EFIQAXW4LE6LUWG2DLANCNFSM5MA5UTJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

rijuseth1312 commented 2 years ago

Hi Ian

There is one more bug I think while we accept the request from producer side. Once we create Data Product , the data product gets created in lake formation databases after that from consumer side I created a request and I got the subscription id , The code is as follows :-

import logging from data_mesh_util import DataMeshConsumer as dmc

data_mesh_account = 'data-mesh-admin account id' aws_region = 'us-east-1' consumer_credentials = { "AccountId": "~~~", "AccessKeyId": "#######################", "SecretAccessKey": "##############################" } data_mesh_consumer = dmc.DataMeshConsumer( data_mesh_account_id=data_mesh_account, log_level=logging.DEBUG, region_name=aws_region, use_credentials=consumer_credentials )

owner_account_id = 'producer-account-id' database_name = 'redshift' tables = 'cars' request_permissions = ['SELECT', 'DESCRIBE']

subscription = data_mesh_consumer.request_access_to_product( owner_account_id=owner_account_id, database_name=database_name, tables=tables, request_permissions=request_permissions ) print(subscription.get('SubscriptionId'))

Now I got a subscription ID

On the other hand , from producer side , when I try to grant the request, it throws error everytime... The code is as below :-

import logging from data_mesh_util import DataMeshProducer as dmp

data_mesh_account = '#########' aws_region = 'us-east-1' producer_credentials = { "AccountId": "#############", "AccessKeyId": "####################", "SecretAccessKey": "#########################3" } data_mesh_producer = dmp.DataMeshProducer( data_mesh_account_id=data_mesh_account, log_level=logging.DEBUG, region_name=aws_region, use_credentials=producer_credentials )

get the pending access requests

pending_requests = data_mesh_producer.list_pending_access_requests() print(pending_requests)

pick one to approve

choose_subscription = pending_requests.get('Subscriptions')[0]

The subscription ID that the Consumer created and returned from list_pending_access_requests()

subscription_id = choose_subscription.get('SubscriptionId') print("subscription id is :-",subscription_id)

Set the permissions to grant to the Consumer - in this case whatever they asked for

grant_permissions = choose_subscription.get('RequestedGrants')

List of permissions the consumer can pass on. Usally only DESCRIBE or SELECT

grantable_permissions = ['DESCRIBE','SELECT']

String value to associate with the approval

approval_notes = 'The request has been accepted!'

approve access requested

approval = data_mesh_producer.approve_access_request( request_id=subscription_id, grant_permissions=grant_permissions, grantable_permissions=grantable_permissions, decision_notes=approval_notes )

Once I run this code : I get the following logs and error :- [cloudshell-user@ip-10-1-82-5 ~]$ python3 grant-access Created new STS Session for Data Mesh Admin Producer {'AccessKeyId': 'ASIA3A5O2TMCIFJQUY6F', 'SecretAccessKey': 'LICSaS8N6ghDbIWBff5UEC2EtwrbBa7OQc3DWeBD', 'SessionToken': 'FwoGZXIvYXdzEJ3//////////wEaDN3Zb5LWX2LvFHIPsyLkAZY2+sWu7lhla6E9OtHqXQj06F0T4rZIA/85gqyqD9mf0GgU9/LNGK9KSvgCN+RcEluKXCv8Gf8/GTnFHelMn9ZJZ3Z+BXof7BDMoEJKZLsoc14GeRHZ0UBmsGNSVEOVqzo0w8DqnTxE80vQG3siAV9DH0cMTwy7EhCG2WCY52mHNLXzd8SB8KnNe9z/iq51bzHv+9uPUh5FR9gAnmQ3DS0Y66dZI4S5Bb9bdaOkVp6a6OkfgU4IC7oD4cZQFyxSQybDX4OEC4UaaGMum/DT9fpRL3rcQ97D75OX4na0yyLqiAsDbii7g5CPBjItHbDrRgHT00vAjeis2olu68dgEA46x2glR98iZqbYQM3jiqw0YxHKHMfCPR5u', 'Expiration': datetime.datetime(2022, 1, 16, 12, 30, 3, tzinfo=tzlocal())} {'Subscriptions': [{'SubscriberPrincipal': '408703654260', 'CreationDate': '2022-01-16 10:47:55', 'DatabaseName': 'redshift', 'RequestedGrants': ['SELECT', 'DESCRIBE'], 'TableName': 'cars', 'CreatedBy': 'arn:aws:sts::757891504900:assumed-role/DataMeshAdminConsumer-408703654260/AROAV6KFC3F2BGPRZWJJ7AIDAV6KFC3F2PCYROSY56-408703654260-2022-01-', 'SubscriptionId': 'zyZpJVhCiWAEjZ3nedB29D'}, {'SubscriberPrincipal': '408703654260', 'CreationDate': '2022-01-16 08:30:52', 'DatabaseName': 'redshift-175908995626', 'RequestedGrants': ['SELECT', 'DESCRIBE'], 'TableName': '%c', 'CreatedBy': 'arn:aws:sts::757891504900:assumed-role/DataMeshAdminConsumer-408703654260/AROAV6KFC3F2BGPRZWJJ7AIDAV6KFC3F2PCYROSY56-408703654260-2022-01-', 'SubscriptionId': 'aikZH86JBqCQEsVJ6wLGHh'}, {'SubscriberPrincipal': '408703654260', 'CreationDate': '2022-01-12 15:57:18', 'DatabaseName': 'redshift', 'RequestedGrants': ['SELECT', 'DESCRIBE'], 'TableName': 'country', 'CreatedBy': 'arn:aws:sts::757891504900:assumed-role/DataMeshAdminConsumer-408703654260/AROAV6KFC3F2BGPRZWJJ7AIDAV6KFC3F2PCYROSY56-408703654260-2022-01-', 'SubscriptionId': 'FRNeoDYVe8r3viJ2PFb7vc'}, {'SubscriberPrincipal': '408703654260', 'CreationDate': '2022-01-16 10:31:32', 'DatabaseName': 'redshift-175908995626', 'RequestedGrants': ['SELECT', 'DESCRIBE'], 'TableName': 'cars_link', 'CreatedBy': 'arn:aws:sts::757891504900:assumed-role/DataMeshAdminConsumer-408703654260/AROAV6KFC3F2BGPRZWJJ7AIDAV6KFC3F2PCYROSY56-408703654260-2022-01-', 'SubscriptionId': 'Zu4qQxs7XGSa3iipoTbdF4'}, {'SubscriberPrincipal': '408703654260', 'CreationDate': '2022-01-16 07:03:26', 'DatabaseName': 'redshift-175908995626', 'RequestedGrants': ['SELECT', 'DESCRIBE'], 'TableName': 'cars', 'CreatedBy': 'arn:aws:sts::757891504900:assumed-role/DataMeshAdminConsumer-408703654260/AROAV6KFC3F2BGPRZWJJ7AIDAV6KFC3F2PCYROSY56-408703654260-2022-01-', 'SubscriptionId': '6m7zRd9nnezYBXabbySLne'}]} subscription id is :- zyZpJVhCiWAEjZ3nedB29D Traceback (most recent call last): File "grant-access", line 42, in decision_notes=approval_notes File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/DataMeshProducer.py", line 426, in approve_access_request load_lf_tags=False File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/lib/ApiAutomator.py", line 477, in load_glue_tables _no_data() File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/lib/ApiAutomator.py", line 457, in _no_data source_db_name)) Exception: Unable to find any Tables matching c in Database redshift

~~ I think on approval access request is creating the problem in library , Please let me know if I am doing anything wrong because I am trying to resolve the problem, but the request is not getting granted.

IanMeyers commented 2 years ago

Yep - the issue is that the signature for the Consumer to request access takes a list of tables. However, you are passing a string, and this is unchecked in the library. I'd like to request that you install version 1.0.2 from TestPypi to confirm that you can continue running with a string argument and that it works?