aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
10.04k stars 6.75k forks source link

[Content Improvement] Datasets used in sagemaker_featurestore_fraud_detection_python_sdk.ipynb #4284

Open kalyanr-agi opened 1 year ago

kalyanr-agi commented 1 year ago

Link to the notebook sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.ipynb

What aspects of the notebook can be improved? links to dataset being used

What are your suggestions? I can't find the dataset being used in the example. A link to them would be great

netsatsawat commented 1 year ago

Hi @kalyanr-agi ,

Thank you for your question, are you referring to this cell in particular?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import io

s3_client = boto3.client("s3", region_name=region)

fraud_detection_bucket_name = f"sagemaker-example-files-prod-{region}"
identity_file_key = (
    "datasets/tabular/fraud_detection/synthethic_fraud_detection_SA/sampled_identity.csv"
)
transaction_file_key = (
    "datasets/tabular/fraud_detection/synthethic_fraud_detection_SA/sampled_transactions.csv"
)

identity_data_object = s3_client.get_object(
    Bucket=fraud_detection_bucket_name, Key=identity_file_key
)
transaction_data_object = s3_client.get_object(
    Bucket=fraud_detection_bucket_name, Key=transaction_file_key
)

identity_data = pd.read_csv(io.BytesIO(identity_data_object["Body"].read()))
transaction_data = pd.read_csv(io.BytesIO(transaction_data_object["Body"].read()))

If yes, the code is actually referring to the data stored in public S3 bucket. Can you try upgrading boto3 and sagemaker to the latest one? Also are you running this locally or using SageMaker notebook within AWS environment?