aws-samples / amazon-sagemaker-genai-datamesh

MIT No Attribution
25 stars 9 forks source link

Feedback #1

Open akozd opened 1 year ago

akozd commented 1 year ago
  1. "RedShift Serverless - to connect to datastore tickit to retieve information regarding ticket sales." <- fix spelling error
  2. "This notebook demonstrates how Large Language Models such as Anthropic integrates with AWS databases, AWS data stores, 3rd party Data Warehousing solutions such as offered by Snowflake and APIs." <- Last part of this sentence does not make sense
  3. Inconsistent usage of capitalization throughout the intro
  4. Clear outputs of notebook cells, or provide a clean walkthrough (currently you see that cell 1 has had 46 executions)
  5. why are the pip installs commented out? Why are you individually installing each library? You should pip install the libraries all at once.
  6. get rid of your actual AWS account numbers in the notebook. Use placeholders like 123456789012.
  7. Explicitly tell customers which values they must update. Tell them that if the values are not updated, then the solution won't run.
  8. engine_snowflake=create_engine('snowflake://'+sf_username+':'+sf_password+'@'+sf_account_id+'/'+db+'/'+schema+'?warehouse='+dwh) <-- use f-strings instead
  9. Why is the rest of the connector code commented out? Explicitly call out the reason, and provide instructions on when to uncomment the code.
  10. Consider breaking out the parse_catalog() function to comprise of multiple helper functions, one for each type of metadata source.
  11. parse_catalog() doesn't also really work for customer use cases. A bunch of things have been hardcoded such as stockmarket_schema. Really you want this solution to be generic enough so that the customer updates their creds in the first section, and then in parse_catalog() the catalog is automatically constructed from the specified sources.
  12. I think that there should be steps that show how to deploy the sagemaker endpoints. It will improve the easiness of use of the notebook.
  13. else: print("error" ) print("Step complete. Channel is: ", channel) -> You are just printing an error statement. The subsequent line will print out as well, as if nothing happened. Consider raising an actual Exception
  14. same callout for notebook step 4 regarding printing "error"
  15. Consider refactoring so that the logic in steps 3 and 4 is written as functions. Then, in order to perform a new query, the user just has to run one cell. Example experience would be something like print(resolve_query(query="Which stock performed the best and the worst in May of 2013?"). In general, the code should be as easy to use and as generic as possible for the customer.
  16. Delete the blank cells at the bottom of the notebook
karimkhanc commented 10 months ago

@akozd did you notice Table creation steps are not available in the notebook -https://github.com/aws-samples/amazon-sagemaker-genai-datamesh/blob/main/blogs/Simple-text-to-sql/mda_with_llm_langchain_byo_model_without_cloudformation.ipynb ?

database creation step is there, but when I run the code, tables are not getting generated. Due to which when we run

def parse_catalog():
    #Connect to Glue catalog
    #get metadata of redshift serverless tables
    columns_str=''

    #define glue cient
    glue_client = boto3.client('glue')

    for db in gdc:
        response = glue_client.get_tables(DatabaseName =db)
        print(response)
        for tables in response['TableList']:
            print(tables)

Table list comes empty like this - 'TableList': []

I am following the same code and steps in the notebook