aws / sagemaker-feature-store-spark

Apache License 2.0
6 stars 3 forks source link

x-account feature ingestion does not work #22

Open dgu1-godaddy opened 4 months ago

dgu1-godaddy commented 4 months ago

How to repro:

ACCOUNT A: Feature Group ARN: arn:aws:sagemaker:us-west-2:<ACCOUNT_ID_A>:feature-group/example-feature-group RAM share: shared with ACCOUNT B, with arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerFeatureGroupReadWrite and arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerCatalogResourceSearch permissions

ACCOUNT B: EMR release label: 6.15.0 (IAM role has sagemaker full access and sagemaker feature store full access) sagemaker-feature-store-pyspark version: 1.1.2

CODE:

...
from feature_store_pyspark.FeatureStoreManager import FeatureStoreManager
...
feature_group_arn = arn:aws:sagemaker:us-west-2:<ACCOUNT_ID_A>:feature-group/example-feature-group
feature_store_manager = FeatureStoreManager()
feature_store_manager.ingest_data(input_data_frame=df, feature_group_arn=feature_group_arn)

ERROR: message:

ERROR:__main__:Failed to ingest data: An error occurred while calling o101.ingestDataInJava.
: smfs.shaded.software.amazon.awssdk.services.sagemaker.model.ResourceNotFoundException: Resource Not Found: Amazon SageMaker can't find a FeatureGroup with name example-feature-group (Service: SageMaker, Status Code: 400, Request ID:)

However, if I use boto3 and sagemaker, the same ingestion worked:

...
import boto3
import sagemaker
from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker.session import Session
...

boto_session = boto3.Session(region_name="us-west-2")
sagemaker_client = boto_session.client(service_name="sagemaker")
featurestore_runtime = boto_session.client(service_name="sagemaker-featurestore-runtime")
feature_store_session = Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_featurestore_runtime_client=featurestore_runtime,
)

feature_group_arn = "arn:aws:sagemaker:us-west-2:<ACCOUNT_ID_A>:feature-group/example-feature-group"
example_feature_group = FeatureGroup(name=feature_group_arn, sagemaker_session=feature_store_session)

example_feature_group.ingest(data_frame=df.toPandas(), max_workers=4, wait=True)

It seems like that sagemaker_feature_store_pyspark lib does not work in a x-account scenario and we would like to request this feature

Thank you